So, with the help of @wds15, I post here the vectorized code of the model, with the non-centered parametrization.
I hope someone else will find it useful in future. Many thanks to Kazuki Yoshida that initially shared his code.
data {
// Define variables in data
// Number of level-1 observations
int<lower=0> Ni;
// Number of level-2 clusters
int<lower=0> Nj;
// Number of level-3 clusters
int<lower=0> Nk;
// Cluster IDs
int<lower=1, upper=Nj> classid[Ni];
int<lower=1, upper=Nk> schoolid[Ni];
// Level 3 look up vector for level 2
int<lower=1, upper=Nk> schoolLookup[Nj];
// Continuous outcome
vector[Ni] mathgain;
}
transformed data {
vector[Ni] Y_ijk;
Y_ijk = (mathgain - mean(mathgain))/sd(mathgain);
}
parameters {
// Population intercept
real beta_0;
// Level-1 errors
real<lower=0> sigma_e0;
// Level-2 random effect
vector[Nj] u_0jk;
real<lower=0> sigma_u0jk;
// Level-3 random effect
vector[Nk] u_0k;
real<lower=0> sigma_u0k;
}
transformed parameters {
// Varying intercepts
vector[Nj] beta_0jk;
vector[Nk] beta_0k;
// Varying intercepts definition
// Level-3 with centered parametrization
beta_0k = beta_0 + u_0k * sigma_u0k;
// Level-2
beta_0jk = beta_0k[schoolLookup] + u_0jk * sigma_u0jk;
}
model {
// Prior part of Bayesian inference
beta_0 ~ normal(0, 1);
sigma_e0 ~ normal(0, 1);
sigma_u0k ~ normal(0,1);
sigma_u0jk ~ normal(0,1);
// Random effects distribution
u_0k ~ normal(0, 1);
u_0jk ~ normal(0, 1);
// Likelihood
Y_ijk ~ normal(beta_0jk[classid], sigma_e0);
}
It runs in 281.321 seconds (Total) which is a great improvement form the initial version, with higher effective sample size for the estimated model parameters.
4 chains, each with iter=10000; warmup=1000; thin=10;
post-warmup draws per chain=900, total post-warmup draws=3600.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta_0 0.00 0 0.04 -0.09 -0.03 0.00 0.02 0.08 3573 1
sigma_e0 0.93 0 0.02 0.89 0.91 0.93 0.94 0.97 3436 1
sigma_u0jk 0.28 0 0.07 0.14 0.24 0.29 0.33 0.41 2456 1
sigma_u0k 0.25 0 0.06 0.11 0.21 0.25 0.29 0.36 2528 1