Help with multilevel model. Chains do not mix

Hi there,

since my multilevel model does not converge properly in jags, I am trying to run it with Stan. However, although the model runs, the chains do not mix at all. The different chains barely vary at all and I am trying to figure out why.

The model is the following:

y_{i} \sim Poisson(\theta_{i}),\, \, \, \text{for}\, \, i = 1,...,n

\theta_{i} = e^{\alpha_{c[i]} + \beta_{c[i]}^{1}x_{i}^{1} + \beta^{2}x_{i}^{2} + \beta_{c[i]}^{3}x_{i}^{1}x_{i}^{2} + \epsilon_{i}},\, \, \, \text{for}\, \, i = 1,...,n,

\begin{pmatrix}\alpha_{c}\\\beta_{c}^{1}\\\beta_{c}^{3}\end{pmatrix} \sim N\begin{pmatrix}\begin{pmatrix}\mu_{c}^{\alpha}\\\mu_{c}^{\beta^{1}}\\\mu_{c}^{\beta^{3}}\end{pmatrix}, \begin{pmatrix}\sigma_{\alpha}^{2}&\rho_{12}\sigma_{\alpha}\sigma_{\beta^{1}}&\rho_{13}\sigma_{\alpha}\sigma_{\beta^{3}}\\\rho_{21}\sigma_{\alpha}\sigma_{\beta^{1}}&\sigma_{\beta^{1}}^{2}&\rho_{23}\sigma_{\beta^{1}}\sigma_{\beta^{3}}\\\rho_{31}\sigma_{\alpha}\sigma_{\beta^{3}}&\rho_{32}\sigma_{\beta^{1}}\sigma_{\beta^{3}}&\sigma_{\beta^{3}}^{2}\end{pmatrix}\end{pmatrix}, \, \, \, \mbox{for}\, \, c = 1,...,C,

\mu_{c}^{\alpha} = \gamma_{0}^{\alpha} + \gamma_{1}^{\alpha}x_{c}^{3}\,,\, \, \, \mbox{for}\, \, c = 1,...,C,

\begin{equation} \mu_{c}^{\beta^{1}} = \mu_{c}^{\beta^{3}} = \gamma_{0}^{\beta} + \gamma_{1}^{\beta}x_{c}^{3}\,,\, \, \, \mbox{for}\, \, c = 1,...,C. \end{equation}


The current stan code for the model is:

data {
  int N;
  int ncoa; // Number of groups
  // group vector
  int coa[N];
  
  // DV
  int<lower=0> y[N];
  
  // Individual level variables
 real x1[N];
 real x2[N];
  
  // group level variable
  real x3[ncoa];
  matrix[3,3] W;
  real df;
}
parameters {
  
// individual level parameters
  real beta2;

  real epsilon[N];
  real<lower=0> sigma_epsilon;
 real a0_raw;
 real gamma_a_raw;
	real b0_raw;
	real gamma_b_raw;
	real c0_raw;
	real gamma_c_raw;
 
	matrix[ncoa,3] B_raw;
	matrix[3,3] Tau_B_raw;
	real xi_a;
	real xi_b;
	real xi_c;
  
}
transformed parameters {

  real lp[N];
  vector[ncoa] beta3;
  vector[ncoa] beta0;
  vector[ncoa] beta1;

	// paramters at group stage
	matrix[ncoa,3] B_raw_hat;
	matrix[3,3] Sigma_B_raw;
	matrix[3,3] rho_B;
	real sigma_a;
	real sigma_b;
	real sigma_c;
	
	
	real a0;
	real gamma_a;
	real b0;
	real gamma_b;
	real c0;
	real _c;
  
// group linear predictors  
for(j in 1:ncoa){
	beta0[j] = xi_a * B_raw[j,1];
	beta1[j] = xi_b * B_raw[j,2];
	beta3[j] = xi_c * B_raw[j,3];
    B_raw_hat[j,1] = a0_raw + gamma_a_raw * x3[j];
    B_raw_hat[j,2] = b0_raw + gamma_b_raw * x3[j];
    B_raw_hat[j,3] = c0_raw + gamma_c_raw * x3[j];	
}
a0 = xi_a * a0_raw;
gamma_a = xi_a * gamma_a_raw;
b0 = xi_b * b0_raw;
gamma_b = xi_b * gamma_b_raw;
c0 = xi_c * c0_raw;
gamma_c = xi_c * gamma_c_raw;

Sigma_B_raw[,] = inverse(Tau_B_raw[,]);
for(k in 1:3){
  for(k_prime in 1:3){
    rho_B[k,k_prime] = Sigma_B_raw[k,k_prime]/
      sqrt(Sigma_B_raw[k,k]*Sigma_B_raw[k_prime,k_prime]);
  }
}
sigma_a = xi_a*sqrt(Sigma_B_raw[1,1]);
sigma_b = xi_b*sqrt(Sigma_B_raw[2,2]);
sigma_c = xi_c*sqrt(Sigma_B_raw[3,3]);


// Linear predictor
for(i in 1:N){
  lp[i] = exp(a[coa[i]] + 
  beta1[coa[i]] * x1[i] + 
  beta2 * x2[i] + 
  beta3[coa[i]] * x2[i] * x3[i] + 
  epsilon[i]);
}
}
model {

// priors
x2 ~ normal(0,100);

a0_raw ~ normal(0,100);
b0_raw ~ normal(0,100);
c0_raw ~ normal(0,100);
gamma_a_raw ~ normal(0,100);
gamma_b_raw ~ normal(0,100);
gamma_c_raw ~ normal(0,100);

xi_a ~ uniform(0,100);
xi_b ~ uniform(0,100);
xi_c ~ uniform(0,100);

sigma_epsilon ~ uniform(0,100);

for(p in 1:ncoa){
B_raw[p,] ~ multi_normal(B_raw_hat[p,], Tau_B_raw[,]);
}

Tau_B_raw[,] ~ wishart(df,W[,]);

y ~ poisson(lp);

for(z in 1:N){
epsilon[z] ~ normal(0, sigma_epsilon);
}
}

Initial values for Tau_b_raw are drawn from a wishart distribution. The matrix W is simply a 3 times 3 identity matrix and df declares that thereare 4 degres of freedom for the wishart distribution.

I am very new at Stan and I would be very happy if somebody could give me a hand. Thanks in advance for your guidance and your patience.

What is epsilon?

Epsilon accounts for overdispersion as described by Gelman and Hill (2007, p. 325). The model is an overdispersed poisson regression.

First, I’d suggest indenting things for readability. Here’s with auto-indent from emacs stan mode:

data {
  int N;
  int ncoa; // Number of groups
  // group vector
  int coa[N];
  
  // DV
  int<lower=0> y[N];
  
  // Individual level variables
  real x1[N];
  real x2[N];
  
  // group level variable
  real x3[ncoa];
  matrix[3,3] W;
  real df;
}
parameters {
  
  // individual level parameters
  real beta2;

  real epsilon[N];
  real<lower=0> sigma_epsilon;
  real a0_raw;
  real gamma_a_raw;
  real b0_raw;
  real gamma_b_raw;
  real c0_raw;
  real gamma_c_raw;
 
  matrix[ncoa,3] B_raw;
  matrix[3,3] Tau_B_raw;
  real xi_a;
  real xi_b;
  real xi_c;
  
}
transformed parameters {

  real lp[N];
  vector[ncoa] beta3;
  vector[ncoa] beta0;
  vector[ncoa] beta1;

  // paramters at group stage
  matrix[ncoa,3] B_raw_hat;
  matrix[3,3] Sigma_B_raw;
  matrix[3,3] rho_B;
  real sigma_a;
  real sigma_b;
  real sigma_c;
	
	
  real a0;
  real gamma_a;
  real b0;
  real gamma_b;
  real c0;
  real _c;
  
  // group linear predictors  
  for(j in 1:ncoa){
    beta0[j] = xi_a * B_raw[j,1];
    beta1[j] = xi_b * B_raw[j,2];
    beta3[j] = xi_c * B_raw[j,3];
    B_raw_hat[j,1] = a0_raw + gamma_a_raw * x3[j];
    B_raw_hat[j,2] = b0_raw + gamma_b_raw * x3[j];
    B_raw_hat[j,3] = c0_raw + gamma_c_raw * x3[j];	
  }
  a0 = xi_a * a0_raw;
  gamma_a = xi_a * gamma_a_raw;
  b0 = xi_b * b0_raw;
  gamma_b = xi_b * gamma_b_raw;
  c0 = xi_c * c0_raw;
  gamma_c = xi_c * gamma_c_raw;

  Sigma_B_raw[,] = inverse(Tau_B_raw[,]);
  for(k in 1:3){
    for(k_prime in 1:3){
      rho_B[k,k_prime] = Sigma_B_raw[k,k_prime]/
          sqrt(Sigma_B_raw[k,k]*Sigma_B_raw[k_prime,k_prime]);
    }
  }
  sigma_a = xi_a*sqrt(Sigma_B_raw[1,1]);
  sigma_b = xi_b*sqrt(Sigma_B_raw[2,2]);
  sigma_c = xi_c*sqrt(Sigma_B_raw[3,3]);


  // Linear predictor
  for(i in 1:N){
    lp[i] = exp(a[coa[i]] + 
                beta1[coa[i]] * x1[i] + 
                beta2 * x2[i] + 
                beta3[coa[i]] * x2[i] * x3[i] + 
                epsilon[i]);
  }
}
model {

  // priors
  x2 ~ normal(0,100);

  a0_raw ~ normal(0,100);
  b0_raw ~ normal(0,100);
  c0_raw ~ normal(0,100);
  gamma_a_raw ~ normal(0,100);
  gamma_b_raw ~ normal(0,100);
  gamma_c_raw ~ normal(0,100);

  xi_a ~ uniform(0,100);
  xi_b ~ uniform(0,100);
  xi_c ~ uniform(0,100);

  sigma_epsilon ~ uniform(0,100);

  for(p in 1:ncoa){
    B_raw[p,] ~ multi_normal(B_raw_hat[p,], Tau_B_raw[,]);
  }

  Tau_B_raw[,] ~ wishart(df,W[,]);

  y ~ poisson(lp);

  for(z in 1:N){
    epsilon[z] ~ normal(0, sigma_epsilon);
  }
}

I’d suggest looking at the regression chapter in the user’s guide section of the manual for advice on parameterizing a multivariate normal prior—we don’t recommend those Wisharts.

The syntax being used in things like this is from BUGS:

Tau_B_raw[,] ~ wishart(df,W[,]);

It’s acceptable in Stan, but it’s clearer to just write

Tau_B_raw ~ wishart(df, W);

The multivariate normal sampling statement should be vectorized. And if you follow the advice in the manual, it will be in Cholesky factor form.

These are inconsistent:

parameters {
  real xi_a;
model {
  xi_a ~ uniform(0,100);

See the description at the beginning of the manual—constraints need to match support for all parameters.

Lots of other things can be vectorized here for efficiency.

If you want to use W as an identity matrix, you don’t need to use the Wishart. Anyway, I’d recommend our LKJ Cholesky factorization of covariance.

You should let Stan initialize if at all feasible.

I’d also recommend starting with a simpler model and building up. You’re likely running into problems of identifiability that your weak priors won’t mitigate. There’s a chapter on problematic posteriors in the manual that can help. Also, lots of discussion in Gelman and Hill on identifiability.

1 Like