# Having trouble fitting a simple varying intercept model

#1

I’m trying to learn Stan and generated a simple synthetic data set. The data set is generated with the following R code:

``````library(dplyr)
library(rstan)
set.seed(12445)
n_cats <- 10
n <- 1000
alpha_mu <- 10
alpha_tau <- 2
sigma <- 0.2
df <- data.frame(cat = sample(1:n_cats, n, replace = TRUE))
alpha <- data.frame(cat = 1:n_cats,
alpha = rnorm(n_cats, alpha_mu, alpha_tau))

df1 <- left_join(df, alpha, by = "cat") %>%
mutate(y = alpha + rnorm(n, 0, sigma)) %>%
tbl_df()

data <- list(n_cats = n_cats,
n_samples = n,
cats = df\$cat,
y = df1\$y)
``````

Essentially, I’m generating a mean (alpha) for every category from a reference distribution. The y values are generated by sampling from a distribution with that mean and a constant variance. In other words, this is the data generating process:

y ~ N(alpha_i, sigma)
alpha_i ~ N(alpha_mu, alpha_tau)

Fitting this model in lmer, recovers the correct parameters.

``````library(lme4)
lmer(y ~ (1 | cat), df1) %>% coef()
``````

Output from this is:

``````   (Intercept)
1     9.504471
2    10.239418
3    13.213756
4     6.875253
5    13.899031
6     7.953558
7     9.806968
8     8.089552
9    10.960454
10    8.809423
``````

This is very close to the correct parameters.

I then tried to fit this using stan. The stan program is shown below:

``````data {
int<lower=1> n_cats;
int<lower=0> n_samples;
int<lower=1> cats[n_samples];
real y[n_samples];
}
parameters {
real<lower=9, upper=11> alpha_mu;
real<lower=1, upper=3> alpha_tau;
real alpha[n_cats];
real<lower=0, upper=1.0> sigma;
}
model {
for (i in 1:n_cats) {
alpha[i] ~ normal(alpha_mu, alpha_tau);
}

for (i in 1:n_samples) {
y ~ normal(alpha[cats[i]], sigma);
}
}
``````

But when I look at the coefficients, all the alpha_i are equal to the grand mean.

``````                 mean se_mean
alpha_mu         9.95    0.01
alpha_tau        1.14    0.00
alpha[1]         9.96    0.00
alpha[2]         9.96    0.00
alpha[3]         9.96    0.00
alpha[4]         9.96    0.00
alpha[5]         9.96    0.00
alpha[6]         9.96    0.00
alpha[7]         9.96    0.00
alpha[8]         9.96    0.00
alpha[9]         9.96    0.00
alpha[10]        9.96    0.00
sigma            1.00    0.00
lp__      -2368724.67    0.10
``````

What am I doing wrong here?

#2

`y ~ normal(alpha[cats[i]], sigma);`

Should be:

`y[i] ~ normal(alpha[cats[i]], sigma);`

I think. See if that changes anything.

#3

Doh! Of course. Yup that worked. Thanks!

#4

`y ~ normal(alpha[cats], sigma);` without the `for` loop should work as well (that’s the vectorized way to write the likelihood).