# Bad estimates (possible misspecification of multilevel model)

Hi, I’ve been trying to find an error in my model for a few days, and I’m finally at the point where I need some help from community. I’m trying to fit a simple multilevel model with a single group-level predictor Z and a single individual predictor X. This is the way I specify the model:

``````bayes_model = "
data{
int<lower=1> J; //number of clusters
int<lower=1> N; //number of individuals
int<lower=1,upper=J> clust[N]; // cluster for each individual

vector[N] y; //vector of responses
vector[N] X; //vector of individual covariates
vector[J] Z; //vector of cluster characteristics
}

parameters {
real<lower=0> sigma_clust; // cluster-level SD
real<lower=0> sigma_ind; //individual-level SD

real gamma0; // group-level intercepts
real a[J]; //random effects
real gamma1; // group-level slope
real beta; // individual slope
}

transformed parameters {
vector[J] alpha;
vector[N] y_hat;

for (i in 1:J) {
alpha[i] = a[i] + Z[i] * gamma1;
}

for (i in 1:N) {
y_hat[i] = alpha[clust[i]] + X[i] * beta;
}
}

model {
sigma_clust ~ inv_gamma(0.001, 0.001);  //priors
sigma_ind ~ inv_gamma(0.001, 0.001);
gamma0 ~ normal(0, 25);
gamma1 ~ normal(0, 25);
beta ~ normal(0, 25);

a ~ normal(gamma0, sigma_clust);
y ~ normal(y_hat, sigma_ind);
}
"
``````

When I use simulated data with true values of parameters gamma0 = 0.0, gamma1 = 0.1, beta = 0.2, I get completely wrong estimates for gamma0 (far from 0) and gamma1, while lmer function in R is much more accurate. I’ve tried different parameters for number of iterations, maximum tree depth and adapt_delta, getting the same results. Is there some kind of obvious mistake, or maybe there is a way to reparametrize the model? Thanks a lot.

It sounds like your comfortable with the lme syntax and are curious about how these sorts of models might get implemented in a Stan model?

I’m not an expert on this stuff. The language is kinda foreign to me, but you might try brms: https://cran.r-project.org/web/packages/brms/index.html . That does the lme models, but there’s functions `make_stancode` and `make_standata` that you can use to see the actual Stan model it’s using. This might be a way to track down the differences in whatever models you’re using.

You can use `optimizing` vs. `sampling` to see the difference in the MLE vs. sampled posterior then.

We’ve been trying to discourage priors like these as they can lead to estimates getting pushed heavily out into the tails.

`lmer` is giving you a max marginal likelihood answer, whereas Stan’s giving you a Bayesian posterior. With Bayes, you want to check posterior interval coverage, not the point estimate (though the point estimates from posterior means should be as good or better than those from lme4).

You really need the non-centered parameterization to make this fly. See the manual.