# Help with simple multilevel model code [solved]

I am a new Stan user, and for learning purposes I am trying to recode examples from the book Statistical Rethinking into base Stan, using the Stan Reference Manual to guide me. I am now trying to fit a very simple multilevel model with varying intercepts and slopes, where the average waiting time in different cafés is regressed on whether the observation was done in the morning or the afternoon. In `lmer` or `rstanarm` notation: `wait ~ 1 + afternoon + (1 + afternoon | cafe)`.

Data attached (by the way, `stan_rdump` returned an error saying the dimensions for all elements (array) of the list were not same, I used the normal `dump` function).data.txt (6.2 KB)

This is the code automatically generated by the `map2stan` function in the `rethinking` package, which easily fits the model in a few minutes with 4 chains, 5000 iterations and 2000 warmup samples:

``````data {
int<lower=1> N;
int<lower=1> N_cafe;
real wait[N];
int cafe[N];
int afternoon[N];
}
parameters {
vector[N_cafe] b_cafe;
vector[N_cafe] a_cafe;
real a;
real b;
vector<lower=0> sigma_cafe;
real<lower=0> sigma;
corr_matrix Rho;
}
transformed parameters {
vector v_a_cafeb_cafe[N_cafe];
vector Mu_ab;
cov_matrix SRS_sigma_cafeRho;
for ( j in 1:N_cafe ) {
v_a_cafeb_cafe[j,1] = a_cafe[j];
v_a_cafeb_cafe[j,2] = b_cafe[j];
}
for ( j in 1:2 ) {
Mu_ab = a;
Mu_ab = b;
}
}
model {
vector[N] mu;
Rho ~ lkj_corr( 2 );
sigma ~ cauchy( 0 , 2 );
sigma_cafe ~ cauchy( 0 , 2 );
b ~ normal( 0 , 10 );
a ~ normal( 0 , 10 );
v_a_cafeb_cafe ~ multi_normal( Mu_ab , SRS_sigma_cafeRho );
for ( i in 1:N ) {
mu[i] = a_cafe[cafe[i]] + b_cafe[cafe[i]] * afternoon[i];
}
wait ~ normal( mu , sigma );
}
generated quantities {
vector[N] mu;
real dev;
dev = 0;
for ( i in 1:N ) {
mu[i] = a_cafe[cafe[i]] + b_cafe[cafe[i]] * afternoon[i];
}
dev = dev + (-2)*normal_lpdf( wait | mu , sigma );
}
``````

My code instead is below. I wrote it based on chapter 9.13 of the Stan Manual, esp. pp. 149-150. The name of the variables are slightly different but they match those in the file I attached.

I thought my code was cleaner but it hangs and returns hundreds of divergent transitions, which makes me think that I made some mistakes:

``````data{
int<lower = 1> N; // 200 data points
int<lower = 1> J; // 20 groups (cafés)
int<lower = 1> K; // 2 predictors (intercept and 'afternoon', which is a binary variable)
int<lower =1, upper = J> cafe[N];
matrix[N, K] X;
vector[N] wait;
}
parameters{
corr_matrix[K] Omega;
vector<lower=0>[K] tau;
vector[K] beta[J];
real<lower=0> sigma;
vector[K] mu;
}
model {
tau ~ cauchy(0, 1);
sigma ~ cauchy(0, 1);
Omega ~ lkj_corr(2);
mu ~ normal(0, 10); // this should be a vector of length 2, right?
for (j in 1:J)
{
vector[N] X_beta_cafe;
for (n in 1:N)
X_beta_cafe[n] = X[n] * beta[cafe[n]];
wait ~ normal(X_beta_cafe, sigma);
}
}
generated quantities {
matrix[K, K] Sigma;
}
``````

Thank you for any help,

k.

Model below:

I haven’t taken a close enough look to see why your code fails, but both models seem not ideal in terms of parameterization. Maybe take a look at what brms gives you via

brms::make_stancode(ait ~ 1 + afternoon + (1 + afternoon | cafe), yourData)

Have a look at the j loop in the model section. I believe you need indexes in the loop.

Thanks both. @stijn , well spotted, I changed it and now it works.

@paul.buerkner, for reference:

``````// generated with brms 2.1.0
functions {
}
data {
int<lower=1> N;  // total number of observations
vector[N] Y;  // response variable
int<lower=1> K;  // number of population-level effects
matrix[N, K] X;  // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> J_1[N];
int<lower=1> N_1;
int<lower=1> M_1;
vector[N] Z_1_1;
vector[N] Z_1_2;
int<lower=1> NC_1;
int prior_only;  // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, K - 1] Xc;  // centered version of X
vector[K - 1] means_X;  // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b;  // population-level effects
real temp_Intercept;  // temporary intercept
real<lower=0> sigma;  // residual SD
vector<lower=0>[M_1] sd_1;  // group-level standard deviations
matrix[M_1, N_1] z_1;  // unscaled group-level effects
// cholesky factor of correlation matrix
cholesky_factor_corr[M_1] L_1;
}
transformed parameters {
// group-level effects
matrix[N_1, M_1] r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)';
vector[N_1] r_1_1 = r_1[, 1];
vector[N_1] r_1_2 = r_1[, 2];
}
model {
vector[N] mu = Xc * b + temp_Intercept;
for (n in 1:N) {
mu[n] = mu[n] + (r_1_1[J_1[n]]) * Z_1_1[n] + (r_1_2[J_1[n]]) * Z_1_2[n];
}
// priors including all constants
target += student_t_lpdf(temp_Intercept | 3, 3, 10);
target += student_t_lpdf(sigma | 3, 0, 10)
- 1 * student_t_lccdf(0 | 3, 0, 10);
target += student_t_lpdf(sd_1 | 3, 0, 10)
- 2 * student_t_lccdf(0 | 3, 0, 10);
target += lkj_corr_cholesky_lpdf(L_1 | 1);
target += normal_lpdf(to_vector(z_1) | 0, 1);
// likelihood including all constants
if (!prior_only) {
target += normal_lpdf(Y | mu, sigma);
}
}
generated quantities {
// actual population-level intercept
real b_Intercept = temp_Intercept - dot_product(means_X, b);
corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
vector<lower=-1,upper=1>[NC_1] cor_1;
// take only relevant parts of correlation matrix
cor_1 = Cor_1[1,2];
}
``````

What does `// data for group-level effects of ID 1` mean?

Thanks

k.

don’t worry about that. in your case it just means the data required for
(1 + afternoon | cafe)

By the way, I have fitted the model using `brms`. I think I set the priors correctly as specifieed above using

``````bp <- c(set_prior("normal(0, 10)", class = "Intercept"),
set_prior("normal(0, 10)", class = "b", coef = "afternoon"),
set_prior("cauchy(0, 2)", class = "sd", group = "cafe", coef = "Intercept"),
set_prior("cauchy(0, 2)", class = "sd", group = "cafe", coef = "afternoon"),
set_prior("lkj(2)", class = "cor"))
``````

Well, the sampling took a tenth of the time, just about 6 seconds compared to 60 seconds with my code, which is kind of amazing. And no divergences. I will study `brms` code carefully to learn, but if you had any suggestions for the moment, please do share them.

Thanks,

k.