Help with simple multilevel model code [solved]

specification

#1

I am a new Stan user, and for learning purposes I am trying to recode examples from the book Statistical Rethinking into base Stan, using the Stan Reference Manual to guide me. I am now trying to fit a very simple multilevel model with varying intercepts and slopes, where the average waiting time in different cafés is regressed on whether the observation was done in the morning or the afternoon. In lmer or rstanarm notation: wait ~ 1 + afternoon + (1 + afternoon | cafe).

Data attached (by the way, stan_rdump returned an error saying the dimensions for all elements (array) of the list were not same, I used the normal dump function).data.txt (6.2 KB)

This is the code automatically generated by the map2stan function in the rethinking package, which easily fits the model in a few minutes with 4 chains, 5000 iterations and 2000 warmup samples:

data {
    int<lower=1> N;
    int<lower=1> N_cafe;
    real wait[N];
    int cafe[N];
    int afternoon[N];
}
parameters {
    vector[N_cafe] b_cafe;
    vector[N_cafe] a_cafe;
    real a;
    real b;
    vector<lower=0>[2] sigma_cafe;
    real<lower=0> sigma;
    corr_matrix[2] Rho;
 }
transformed parameters {
    vector[2] v_a_cafeb_cafe[N_cafe];
    vector[2] Mu_ab;
    cov_matrix[2] SRS_sigma_cafeRho;
    for ( j in 1:N_cafe ) {
        v_a_cafeb_cafe[j,1] = a_cafe[j];
        v_a_cafeb_cafe[j,2] = b_cafe[j];
    }
    for ( j in 1:2 ) {
        Mu_ab[1] = a;
        Mu_ab[2] = b;
    }
    SRS_sigma_cafeRho = quad_form_diag(Rho,sigma_cafe);
}
    model {
    vector[N] mu;
    Rho ~ lkj_corr( 2 );
    sigma ~ cauchy( 0 , 2 );
    sigma_cafe ~ cauchy( 0 , 2 );
    b ~ normal( 0 , 10 );
    a ~ normal( 0 , 10 );
    v_a_cafeb_cafe ~ multi_normal( Mu_ab , SRS_sigma_cafeRho );
    for ( i in 1:N ) {
        mu[i] = a_cafe[cafe[i]] + b_cafe[cafe[i]] * afternoon[i];
    }
    wait ~ normal( mu , sigma );
}
    generated quantities {
    vector[N] mu;
    real dev;
    dev = 0;
    for ( i in 1:N ) {
        mu[i] = a_cafe[cafe[i]] + b_cafe[cafe[i]] * afternoon[i];
    }
    dev = dev + (-2)*normal_lpdf( wait | mu , sigma );
}

My code instead is below. I wrote it based on chapter 9.13 of the Stan Manual, esp. pp. 149-150. The name of the variables are slightly different but they match those in the file I attached.

I thought my code was cleaner but it hangs and returns hundreds of divergent transitions, which makes me think that I made some mistakes:

data{
  int<lower = 1> N; // 200 data points
  int<lower = 1> J; // 20 groups (cafés)
  int<lower = 1> K; // 2 predictors (intercept and 'afternoon', which is a binary variable)
  int<lower =1, upper = J> cafe[N]; 
  matrix[N, K] X;
  vector[N] wait;
}
parameters{
  corr_matrix[K] Omega;
  vector<lower=0>[K] tau;
  vector[K] beta[J];
  real<lower=0> sigma;
  vector[K] mu;
}
model {
  tau ~ cauchy(0, 1);
  sigma ~ cauchy(0, 1);
  Omega ~ lkj_corr(2);
  mu ~ normal(0, 10); // this should be a vector of length 2, right?
  for (j in 1:J)
    beta ~ multi_normal(mu, quad_form_diag(Omega, tau));
  {
  vector[N] X_beta_cafe;
  for (n in 1:N)
   X_beta_cafe[n] = X[n] * beta[cafe[n]];
  wait ~ normal(X_beta_cafe, sigma);
}
}
generated quantities {
  matrix[K, K] Sigma;
  Sigma = quad_form_diag(Omega, tau);
}

Thank you for any help,

k.

Model below:


#2

I haven’t taken a close enough look to see why your code fails, but both models seem not ideal in terms of parameterization. Maybe take a look at what brms gives you via

brms::make_stancode(ait ~ 1 + afternoon + (1 + afternoon | cafe), yourData)


#3

Have a look at the j loop in the model section. I believe you need indexes in the loop.


#4

Thanks both. @stijn , well spotted, I changed it and now it works.

@paul.buerkner, for reference:

// generated with brms 2.1.0
functions { 
} 
data { 
  int<lower=1> N;  // total number of observations 
  vector[N] Y;  // response variable 
  int<lower=1> K;  // number of population-level effects 
  matrix[N, K] X;  // population-level design matrix 
  // data for group-level effects of ID 1 
  int<lower=1> J_1[N]; 
  int<lower=1> N_1; 
  int<lower=1> M_1; 
  vector[N] Z_1_1; 
  vector[N] Z_1_2; 
  int<lower=1> NC_1; 
  int prior_only;  // should the likelihood be ignored? 
} 
transformed data { 
  int Kc = K - 1; 
  matrix[N, K - 1] Xc;  // centered version of X 
  vector[K - 1] means_X;  // column means of X before centering 
  for (i in 2:K) { 
    means_X[i - 1] = mean(X[, i]); 
    Xc[, i - 1] = X[, i] - means_X[i - 1]; 
  } 
} 
parameters { 
  vector[Kc] b;  // population-level effects 
  real temp_Intercept;  // temporary intercept 
  real<lower=0> sigma;  // residual SD 
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations 
  matrix[M_1, N_1] z_1;  // unscaled group-level effects 
  // cholesky factor of correlation matrix 
  cholesky_factor_corr[M_1] L_1; 
} 
transformed parameters { 
  // group-level effects 
  matrix[N_1, M_1] r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)'; 
  vector[N_1] r_1_1 = r_1[, 1]; 
  vector[N_1] r_1_2 = r_1[, 2]; 
} 
model { 
  vector[N] mu = Xc * b + temp_Intercept; 
  for (n in 1:N) { 
    mu[n] = mu[n] + (r_1_1[J_1[n]]) * Z_1_1[n] + (r_1_2[J_1[n]]) * Z_1_2[n]; 
  } 
  // priors including all constants 
  target += student_t_lpdf(temp_Intercept | 3, 3, 10); 
  target += student_t_lpdf(sigma | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += student_t_lpdf(sd_1 | 3, 0, 10)
    - 2 * student_t_lccdf(0 | 3, 0, 10); 
  target += lkj_corr_cholesky_lpdf(L_1 | 1); 
  target += normal_lpdf(to_vector(z_1) | 0, 1); 
  // likelihood including all constants 
  if (!prior_only) { 
    target += normal_lpdf(Y | mu, sigma); 
  } 
} 
generated quantities { 
  // actual population-level intercept 
  real b_Intercept = temp_Intercept - dot_product(means_X, b); 
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1); 
  vector<lower=-1,upper=1>[NC_1] cor_1; 
  // take only relevant parts of correlation matrix 
  cor_1[1] = Cor_1[1,2]; 
} 

What does // data for group-level effects of ID 1 mean?

Thanks

k.


#5

don’t worry about that. in your case it just means the data required for
(1 + afternoon | cafe)


#6

By the way, I have fitted the model using brms. I think I set the priors correctly as specifieed above using

bp <- c(set_prior("normal(0, 10)", class = "Intercept"),
        set_prior("normal(0, 10)", class = "b", coef = "afternoon"),
        set_prior("cauchy(0, 2)", class = "sd", group = "cafe", coef = "Intercept"),
        set_prior("cauchy(0, 2)", class = "sd", group = "cafe", coef = "afternoon"),
        set_prior("lkj(2)", class = "cor"))

Well, the sampling took a tenth of the time, just about 6 seconds compared to 60 seconds with my code, which is kind of amazing. And no divergences. I will study brms code carefully to learn, but if you had any suggestions for the moment, please do share them.

Thanks,

k.