I am a new Stan user, and for learning purposes I am trying to recode examples from the book Statistical Rethinking into base Stan, using the Stan Reference Manual to guide me. I am now trying to fit a very simple multilevel model with varying intercepts and slopes, where the average waiting time in different cafés is regressed on whether the observation was done in the morning or the afternoon. In lmer
or rstanarm
notation: wait ~ 1 + afternoon + (1 + afternoon | cafe)
.
Data attached (by the way, stan_rdump
returned an error saying the dimensions for all elements (array) of the list were not same, I used the normal dump
function).data.txt (6.2 KB)
This is the code automatically generated by the map2stan
function in the rethinking
package, which easily fits the model in a few minutes with 4 chains, 5000 iterations and 2000 warmup samples:
data {
int<lower=1> N;
int<lower=1> N_cafe;
real wait[N];
int cafe[N];
int afternoon[N];
}
parameters {
vector[N_cafe] b_cafe;
vector[N_cafe] a_cafe;
real a;
real b;
vector<lower=0>[2] sigma_cafe;
real<lower=0> sigma;
corr_matrix[2] Rho;
}
transformed parameters {
vector[2] v_a_cafeb_cafe[N_cafe];
vector[2] Mu_ab;
cov_matrix[2] SRS_sigma_cafeRho;
for ( j in 1:N_cafe ) {
v_a_cafeb_cafe[j,1] = a_cafe[j];
v_a_cafeb_cafe[j,2] = b_cafe[j];
}
for ( j in 1:2 ) {
Mu_ab[1] = a;
Mu_ab[2] = b;
}
SRS_sigma_cafeRho = quad_form_diag(Rho,sigma_cafe);
}
model {
vector[N] mu;
Rho ~ lkj_corr( 2 );
sigma ~ cauchy( 0 , 2 );
sigma_cafe ~ cauchy( 0 , 2 );
b ~ normal( 0 , 10 );
a ~ normal( 0 , 10 );
v_a_cafeb_cafe ~ multi_normal( Mu_ab , SRS_sigma_cafeRho );
for ( i in 1:N ) {
mu[i] = a_cafe[cafe[i]] + b_cafe[cafe[i]] * afternoon[i];
}
wait ~ normal( mu , sigma );
}
generated quantities {
vector[N] mu;
real dev;
dev = 0;
for ( i in 1:N ) {
mu[i] = a_cafe[cafe[i]] + b_cafe[cafe[i]] * afternoon[i];
}
dev = dev + (-2)*normal_lpdf( wait | mu , sigma );
}
My code instead is below. I wrote it based on chapter 9.13 of the Stan Manual, esp. pp. 149-150. The name of the variables are slightly different but they match those in the file I attached.
I thought my code was cleaner but it hangs and returns hundreds of divergent transitions, which makes me think that I made some mistakes:
data{
int<lower = 1> N; // 200 data points
int<lower = 1> J; // 20 groups (cafés)
int<lower = 1> K; // 2 predictors (intercept and 'afternoon', which is a binary variable)
int<lower =1, upper = J> cafe[N];
matrix[N, K] X;
vector[N] wait;
}
parameters{
corr_matrix[K] Omega;
vector<lower=0>[K] tau;
vector[K] beta[J];
real<lower=0> sigma;
vector[K] mu;
}
model {
tau ~ cauchy(0, 1);
sigma ~ cauchy(0, 1);
Omega ~ lkj_corr(2);
mu ~ normal(0, 10); // this should be a vector of length 2, right?
for (j in 1:J)
beta ~ multi_normal(mu, quad_form_diag(Omega, tau));
{
vector[N] X_beta_cafe;
for (n in 1:N)
X_beta_cafe[n] = X[n] * beta[cafe[n]];
wait ~ normal(X_beta_cafe, sigma);
}
}
generated quantities {
matrix[K, K] Sigma;
Sigma = quad_form_diag(Omega, tau);
}
Thank you for any help,
k.
Model below: