I am trying to fit a *3 level hierarchical random intercept model*.

To understand how to tame such a beast I looted this code by Kazuki Yoshida.

So i tried his un-vectorized code (sorry no simulated data, but real data here)

```
classroom <- read.csv("http://www-personal.umich.edu/~bwest/classroom.csv")
schoolLookupVec <- unique(classroom[c("classid","schoolid")])[,"schoolid"]
dat <- with(classroom,
list(Ni = length(unique(childid)),
Nj = length(unique(classid)),
Nk = length(unique(schoolid)),
classid = classid,
schoolid = schoolid,
schoolLookup = schoolLookupVec,
mathgain = mathgain))
resStan <- stan(model_code = stan_code, data = dat, chains = 4, iter = 10000, warmup = 1000, thin = 10)
```

with the following stan code:

```
data {
// Define variables in data
// Number of level-1 observations (an integer)
int<lower=0> Ni;
// Number of level-2 clusters
int<lower=0> Nj;
// Number of level-3 clusters
int<lower=0> Nk;
// Cluster IDs
int<lower=1> classid[Ni];
int<lower=1> schoolid[Ni];
// Level 3 look up vector for level 2
int<lower=1> schoolLookup[Nj];
// Continuous outcome
real mathgain[Ni];
// Continuous predictor
// real X_1ijk[Ni];
}
parameters {
// Define parameters to estimate
// Population intercept (a real number)
real beta_0;
// Population slope
// real beta_1;
// Level-1 errors
real<lower=0> sigma_e0;
// Level-2 random effect
real u_0jk[Nj];
real<lower=0> sigma_u0jk;
// Level-3 random effect
real u_0k[Nk];
real<lower=0> sigma_u0k;
}
transformed parameters {
// Varying intercepts
real beta_0jk[Nj];
real beta_0k[Nk];
// Individual mean
real mu[Ni];
// Varying intercepts definition
// Level-3 (10 level-3 random intercepts)
for (k in 1:Nk) {
beta_0k[k] <- beta_0 + u_0k[k];
}
// Level-2 (100 level-2 random intercepts)
for (j in 1:Nj) {
beta_0jk[j] <- beta_0k[schoolLookup[j]] + u_0jk[j];
}
// Individual mean
for (i in 1:Ni) {
mu[i] <- beta_0jk[classid[i]];
}
}
model {
// Prior part of Bayesian inference
// Flat prior for mu (no need to specify if non-informative)
// Random effects distribution
u_0k ~ normal(0, sigma_u0k);
u_0jk ~ normal(0, sigma_u0jk);
// Likelihood part of Bayesian inference
// Outcome model N(mu, sigma^2) (use SD rather than Var)
for (i in 1:Ni) {
mathgain[i] ~ normal(mu[i], sigma_e0);
}
}
```

This model takes 574.467 seconds (Total) to run with all 4 chains where the estimated Bayesian Fraction of Missing Information was low. It is impossible to use the pairs() command to identify problems. And:

```
print(fit, pars = c("beta_0","sigma_e0","sigma_u0jk","sigma_u0k"))
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta_0 57.40 0.02 1.47 54.57 56.43 57.42 58.37 60.35 3494 1.00
sigma_e0 32.14 0.02 0.76 30.67 31.62 32.14 32.65 33.66 1854 1.00
sigma_u0jk 9.99 0.11 2.31 4.90 8.61 10.10 11.52 14.20 452 1.01
sigma_u0k 8.62 0.07 2.10 3.97 7.33 8.76 10.01 12.46 821 1.00
```

I tried a vectorized version of the same model (hoping in speeding up a little the model)

```
data {
// Define variables in data
// Number of level-1 observations (an integer)
int<lower=0> Ni;
// Number of level-2 clusters
int<lower=0> Nj;
// Number of level-3 clusters
int<lower=0> Nk;
// Cluster IDs
int<lower=1, upper=Nj> classid[Ni];
int<lower=1, upper=Nk> schoolid[Ni];
// Level 3 look up vector for level 2
int<lower=1, upper=Nk> schoolLookup[Nj];
// Continuous outcome
vector[Ni] mathgain;
}
parameters {
// Population intercept
real beta_0;
// Level-1 errors
real<lower=0> sigma_e0;
// Level-2 random effect
vector[Nj] u_0jk;
real<lower=0> sigma_u0jk;
// Level-3 random effect
vector[Nk] u_0k;
real<lower=0> sigma_u0k;
}
transformed parameters {
// Varying intercepts
vector[Nj] beta_0jk;
vector[Nk] beta_0k;
// Varying intercepts definition
// Level-3
beta_0k = beta_0 + u_0k;
// Level-2
beta_0jk = beta_0k[schoolLookup] + u_0jk;
}
model {
// Prior part of Bayesian inference
// Random effects distribution
u_0k ~ normal(0, sigma_u0k);
u_0jk ~ normal(0, sigma_u0jk);
// Likelihood part of Bayesian inference
mathgain ~ normal(beta_0jk[classid], sigma_e0);
}
```

Fitted values are similar (with better Rhat), but this vectorized model takes longer to sample (more or less it takes alway double of the unvectorized code:837.244 seconds (Total) while I expected an improvement) with the same warnings on low estimated Bayesian Fraction of Missing Information (which still I am not able to explore )

```
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta_0 57.37 0.02 1.44 54.56 56.42 57.35 58.35 60.14 3571 1
sigma_e0 32.15 0.02 0.78 30.68 31.60 32.14 32.68 33.67 1389 1
sigma_u0jk 9.95 0.13 2.47 4.56 8.37 10.15 11.62 14.38 354 1
sigma_u0k 8.47 0.09 2.19 3.45 7.19 8.60 9.89 12.54 593 1
```

Any advice to make my model run faster?

The QR reparametrization could be applied to this kind of models? Or should I try something else like adding some weakly informative priors on parameters?