# Modelling individual specific error-terms in three-level multilevel model

Currently, I am constructing a three-level model which looks like this: y_{ijt} = \beta_{0ij} + x_{ijt}'\beta_{fixed} + \epsilon_{ijt}. I constructed the model with the assumption that \varepsilon_{ijt} \sim N(0,\sigma^2), but now I want to model the variance \sigma^2 of the error-term \varepsilon_{ijt} to be i and j specific, so \sigma_{ij}^2. I do this by using the following model:

data {
// Define variables in data
.
.
.
int<lower=1> error_index[Ni];

// Continuous outcome
int y[Ni];
}

parameters {
// Define parameters to estimate
// Population intercept (a real number)
real beta_0;
// Fixed effects
vector[p] beta;

// Level-1 errors
real<lower=0> sigma_e0[Npars];

// Level-2 random effect
real u_0jk[Npars];
real<lower=0> sigma_u0jk;

// Level-3 random effect
real u_0k[Nk];
real<lower=0> sigma_u0k;
}

transformed parameters  {
// Varying intercepts
real beta_0jk[Npars];
real beta_0k[Nk];

// Individual mean
real mu[Ni];

// Varying intercepts definition
// Level-3 (10 level-3 random intercepts)
for (k in 1:Nk) {
beta_0k[k] = beta_0 + u_0k[k];
}
// Level-2 (100 level-2 random intercepts)
for (j in 1:Npars) {
beta_0jk[j] = beta_0k[countryLookup[j]] + u_0jk[j];
}
// Individual mean
for (i in 1:Ni) {
mu[i] = beta_0jk[countryMonthLookup[i]] + fixedEffectVars[i,]*beta;
}
}

model {
// Prior part of Bayesian inference
// Random effects distribution
u_0k  ~ normal(0, sigma_u0k);
u_0jk ~ normal(0, sigma_u0jk);

sigma_u0jk ~ student_t(3, 0, 10);
sigma_u0k ~ student_t(3, 0, 10);

sigma_e0 ~ student_t(3, 0, 141);

// Likelihood part of Bayesian inference
for (i in 1:Ni) {
y[i] ~ normal(mu[i], sigma_e0[error_index[i]]);
}
}


However, my model now takes hours more of computation time than when modelling only a constant variance \sigma^2 for the error-terms (replacing sigma_e0[error_index[i]] by just sigma_e0). Next to that, I get the message that all transitions after warm-up exceeded the maximum treedepth (10). Does anyone know how to solve these problems. Next to that I do not set priors on the covariance of the error-terms but that should not matter much, right?

Hi I am not sure I understand your model very well. There are some things that are weird however (although likely not related to your actual problem):

• student_t(3, 0, 10); student_t(3, 0, 141); - those are very wide priors, are you sure you need them this wide?
• you have no priors for beta and beta_0, consider adding some

Other than that I have only general advice I wrote on my blog (aimed at divergences, but should be useful here as well), especially try working with data simulated exactly as the model assumes: https://www.martinmodrak.cz/2018/02/19/taming-divergences-in-stan-models/

Do you have suggestions for other priors than the student t distribution, as a follow a paper from Andrew Gelman for the variance priors in hierarchical models?

You can check out https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations in any case, student_t(3,0,1) or even normal(0,1) might be good (if you have the time, running prior predictive checks to determine priors is the current best practice: https://arxiv.org/abs/1709.01449)