Hierarchical Stan model to imitate Stata results

Hi all,

I am new to Stan, and could use some help replicating a hierarchical model that I have been estimating in Stata using the mixed command. The issue is that after trying to replicate the results in Stan, I seem to get some very different results on some variables in the regression, but not on others.

What I am trying to do:
I am trying to fit panel data. I have several observations per individual per year, and individuals over several years. I would furthermore like to explicitly model the sampling setup, so that I want to model a random intercept at the region level. This results in a 4-level hierarchy. I use a within-estimator à la Mundlak 1978 Econometrica for the time-changing effects. In addition, I want to account for characteristics that are time-invariant in the same regression. In Stata, I fit the model by using something like the following code:

mixed y X Z || region: || respondent: || resp_year: ,

where X is a matrix of time-demeaned variables and their means, and Z a matrix of time-invariant characteristics. The term resp_year consists of a sequential number by respondent and year, to reflect the hierarchical nesting of the observations.

The problem I am having:
I have troubles replicating this model in Stan. At first, I tried running it in R using lme4:lmer and lnme:lme. Here I also got different results from Stata. However, looking around the web and trying several things it turns out that these programmes give me results that are equivalent to those obtained running the Stata command above with the relm option, that is they seem to use an algorithm that resembles Stata’s restricted maximum likelihood. The results from my Stan model, however, are very different from both, with coefficients that are almost one order of magnitude smaller than in Stata, and much less also than in lme4. Ideally, I would like to replicate the Stata results using the full maximum likelihood estimator.

Interestingly, the results are almost identical for Stata, R, and Stan in terms of the time-varying part in X. It is only for the time-invariant part in Z that the differences emerge. This may be hinting at a mis-specification of the error/hierarchical structure, but I am at a loss what this may be.

The Stan model I am using:
Here is a minimal example of the Stan model:

    vector[N] mu;
    sigma ~ cauchy( 0 , 0.25 );
    sigma_region ~ cauchy( 0 , 1 );
    sigma_respondents ~ cauchy( 0 , 1 );
    sigma_years ~ cauchy( 0 , 1 );
    omega ~ normal( 0 , sigma_region );
    theta ~ normal( 0 , sigma_years );
    rho ~ normal( 0 , sigma_respondents );
    a ~ normal( 0 , 0.5 );
    beta_tilde ~ normal( 0 , 0.25 );
    mu = a + Q*beta_tilde + omega[region] + rho[respondent] + theta[resp_year];
    y ~ normal( mu , sigma );

The matrix Q is the Q matrix from a QR decomposition of the design matrix, which lumps together both the time-varing and time invariant part above for convenience. The QR decomposition follows the manual:

transformed data{
    real s = sqrt(N - 1.0); 
    matrix[M,M] R = qr_thin_R(X)/s;
    matrix[N,M] Q = qr_thin_Q(X)*s;
    matrix[M,M] R_inv = inverse(R);

Any ideas what is going wrong in my Stan model and how I could get it to imitate the results from my Stata model?


Given that the model can be written in R’s lme notation, I recommend that you fit it using rstanarm. You can just take your lme code directly, just changing the call from lmer() to stan_glmer(). Ultimately there can be advantages to writing the model directly in Stan, as that gives you more flexibility, but if you use rstanarm you should be able to fit it right away, and you’ll get the benefits of full Bayesian inference and simulation.


Or you can fit it in brms; that should work just as well.


Thanks a lot for the tips, Andrew. The reason I want to learn programming this in Stan is because once the basic structure works, I want to apply it to a more complex model, for which I will need the flexibility of Stan. That said, using rstanarm and then obtaining the Stan code from there is a very good idea.

The problem, however, remains. After my models in Stata, lme4, and Stan, rstanarm now gives me a fourth set of estimates (bmrs is yet to converge). Although these numbers may not mean much out of context, let me give you an idea of the different estimates I obtain:

Stata (mixed): -1.388 (0.36)
R (lmer,lme): -1.003 (0.53)
Stan (my model above): -0.188 (0.44)
Stan (stan_glmer): -0.856 (0.62),

where the numbers in brackets are the standard error/standard deviation of the posterior of the reported mean coefficient. I am actually quite unsettled by these results. I am willing to discount my own results in Stan, but that still gives me three sets of very different results from three software packages for one and the same model and data!

It would be good to understand what the problem could be, which ones might be the most reliable estimates (if any!), or how one could get the different estimates to agree at least approximately. It could of course be that the identification of the cross-sectional results is tricky due to the regional intercepts combined with the between estimators in the Mundlak model. However, there is no indication of convergence problems in any of the models, so unless one runs all of them one might be none the wiser.

Any insights highly appreciated!


If you look at the confidence/credible intervals (\beta \pm 2 s.e. ), you will see that the results are not so dramatically different as they first appear.

Stata (Mixed): [-2.1, -0.7]
R (lmer): [-2.1, 0.057]
Stan (stan_glmer): [-2.1, 0.384]

In addition, I think that some of the differences can be attributed to the fact that the frequentist methods estimate the mode of a distribution while Stan estimates the mean which is not necessarily the same quantity. I also think it’s no coincidence that the estimates are lower with higher uncertainty in the estimates. One potential explanation is that the HMC algorithm in Stan is more accurate in estimating the uncertainty in the parameters and this shrinks the parameter estimates towards 0.


Next step in understanding this is to simulate fake data from a known model, then you can see how the different programs do at recovering the parameters. It could be that your Stan model has a bug in it, perhaps in the matrix algebra.


OK, thank you. I will need to dig a little deeper here. If and when I find a solution to this issue I will make sure to post it in this thread.