Bad estimates (possible misspecification of multilevel model)

crazyfrogspb · December 14, 2017, 7:45pm

Hi, I’ve been trying to find an error in my model for a few days, and I’m finally at the point where I need some help from community. I’m trying to fit a simple multilevel model with a single group-level predictor Z and a single individual predictor X. This is the way I specify the model:

bayes_model = "
data{
  int<lower=1> J; //number of clusters
  int<lower=1> N; //number of individuals
  int<lower=1,upper=J> clust[N]; // cluster for each individual
  
  vector[N] y; //vector of responses
  vector[N] X; //vector of individual covariates
  vector[J] Z; //vector of cluster characteristics
}

parameters {
  real<lower=0> sigma_clust; // cluster-level SD
  real<lower=0> sigma_ind; //individual-level SD
  
  real gamma0; // group-level intercepts
  real a[J]; //random effects
  real gamma1; // group-level slope
  real beta; // individual slope
}

transformed parameters {
  vector[J] alpha;
  vector[N] y_hat;

  for (i in 1:J) {
    alpha[i] = a[i] + Z[i] * gamma1;
  }

  for (i in 1:N) {
    y_hat[i] = alpha[clust[i]] + X[i] * beta;
  }
}

model {
  sigma_clust ~ inv_gamma(0.001, 0.001);  //priors
  sigma_ind ~ inv_gamma(0.001, 0.001);
  gamma0 ~ normal(0, 25);
  gamma1 ~ normal(0, 25);
  beta ~ normal(0, 25);

  a ~ normal(gamma0, sigma_clust);
  y ~ normal(y_hat, sigma_ind);
}
"

When I use simulated data with true values of parameters gamma0 = 0.0, gamma1 = 0.1, beta = 0.2, I get completely wrong estimates for gamma0 (far from 0) and gamma1, while lmer function in R is much more accurate. I’ve tried different parameters for number of iterations, maximum tree depth and adapt_delta, getting the same results. Is there some kind of obvious mistake, or maybe there is a way to reparametrize the model? Thanks a lot.

bbbales2 · December 14, 2017, 7:54pm

It sounds like your comfortable with the lme syntax and are curious about how these sorts of models might get implemented in a Stan model?

I’m not an expert on this stuff. The language is kinda foreign to me, but you might try brms: https://cran.r-project.org/web/packages/brms/index.html . That does the lme models, but there’s functions make_stancode and make_standata that you can use to see the actual Stan model it’s using. This might be a way to track down the differences in whatever models you’re using.

You can use optimizing vs. sampling to see the difference in the MLE vs. sampled posterior then.

Bob_Carpenter · January 9, 2018, 5:42am

We’ve been trying to discourage priors like these as they can lead to estimates getting pushed heavily out into the tails.

lmer is giving you a max marginal likelihood answer, whereas Stan’s giving you a Bayesian posterior. With Bayes, you want to check posterior interval coverage, not the point estimate (though the point estimates from posterior means should be as good or better than those from lme4).

You really need the non-centered parameterization to make this fly. See the manual.

Topic		Replies	Views
Help with misspecified model Modeling fitting-issues , specification	2	430	September 10, 2020
Modelling individual specific error-terms in three-level multilevel model Modeling	4	372	April 25, 2019
Multilevel Measurement Error Model with group-specific measurement error components Modeling specification	1	493	May 30, 2019
Large estimated standard deviation in multilevel model despite weakly informative prior Modeling	7	657	March 24, 2019
Non-nested multilevel model Modeling fitting-issues , hierarchical-model	3	526	January 17, 2021

Bad estimates (possible misspecification of multilevel model)

Related topics