Modelling individual specific error-terms in three-level multilevel model

Currently, I am constructing a three-level model which looks like this: y_{ijt} = \beta_{0ij} + x_{ijt}'\beta_{fixed} + \epsilon_{ijt}. I constructed the model with the assumption that \varepsilon_{ijt} \sim N(0,\sigma^2), but now I want to model the variance \sigma^2 of the error-term \varepsilon_{ijt} to be i and j specific, so \sigma_{ij}^2. I do this by using the following model:

data {
  // Define variables in data
.
.
.
  int<lower=1> error_index[Ni];
  
  // Continuous outcome
  int y[Ni];
}

parameters {
  // Define parameters to estimate
  // Population intercept (a real number)
  real beta_0;
  // Fixed effects
  vector[p] beta;
  
  // Level-1 errors
  real<lower=0> sigma_e0[Npars];
  
  // Level-2 random effect
  real u_0jk[Npars];
  real<lower=0> sigma_u0jk;
  
  // Level-3 random effect
  real u_0k[Nk];
  real<lower=0> sigma_u0k;
}

transformed parameters  {
  // Varying intercepts
  real beta_0jk[Npars];
  real beta_0k[Nk];
  
  // Individual mean
  real mu[Ni];
  
  // Varying intercepts definition
  // Level-3 (10 level-3 random intercepts)
  for (k in 1:Nk) {
    beta_0k[k] = beta_0 + u_0k[k];
  }
  // Level-2 (100 level-2 random intercepts)
  for (j in 1:Npars) {
    beta_0jk[j] = beta_0k[countryLookup[j]] + u_0jk[j];
  }
  // Individual mean
  for (i in 1:Ni) {
    mu[i] = beta_0jk[countryMonthLookup[i]] + fixedEffectVars[i,]*beta;
  }
}

model {
  // Prior part of Bayesian inference
  // Random effects distribution
  u_0k  ~ normal(0, sigma_u0k);
  u_0jk ~ normal(0, sigma_u0jk);
  
  sigma_u0jk ~ student_t(3, 0, 10);
  sigma_u0k ~ student_t(3, 0, 10);

  sigma_e0 ~ student_t(3, 0, 141);
  
  
    // Likelihood part of Bayesian inference
  for (i in 1:Ni) {
    y[i] ~ normal(mu[i], sigma_e0[error_index[i]]);
  }
}

However, my model now takes hours more of computation time than when modelling only a constant variance \sigma^2 for the error-terms (replacing sigma_e0[error_index[i]] by just sigma_e0). Next to that, I get the message that all transitions after warm-up exceeded the maximum treedepth (10). Does anyone know how to solve these problems. Next to that I do not set priors on the covariance of the error-terms but that should not matter much, right?

Hi I am not sure I understand your model very well. There are some things that are weird however (although likely not related to your actual problem):

  • student_t(3, 0, 10); student_t(3, 0, 141); - those are very wide priors, are you sure you need them this wide?
  • you have no priors for beta and beta_0, consider adding some

Other than that I have only general advice I wrote on my blog (aimed at divergences, but should be useful here as well), especially try working with data simulated exactly as the model assumes: https://www.martinmodrak.cz/2018/02/19/taming-divergences-in-stan-models/

Do you have suggestions for other priors than the student t distribution, as a follow a paper from Andrew Gelman for the variance priors in hierarchical models?

You can check out https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations in any case, student_t(3,0,1) or even normal(0,1) might be good (if you have the time, running prior predictive checks to determine priors is the current best practice: https://arxiv.org/abs/1709.01449)

@martinmodrak thanks for your reply.

What are the other things you don’t understand about my model. It could also be the case that because of the wide priors the sampling is difficult when using individual specific error-terms right?