Nested effects in Stan vs. stan_lmer()

Following the general advice of starting with something simple before building the full model, I wanted to verify that I understood how to write a simple linear regression with a varying intercept. Here’s the Stan code:

data {
  int n_species;
  int n_sites;
  int n_obs;
  int species[n_obs];
  int site[n_obs];
  real intercept_scale;
  real beta_scale;
  real sigma_scale;
  vector[n_obs] x;
  vector[n_obs] y;
}

parameters {
  real beta_0;
  real beta_1;
  real sigma;
  real sigma_species;
  real sigma_species_site;
  vector[n_species] beta_species;
  matrix[n_species,n_sites] beta_species_site;
}

transformed parameters {
  vector[n_obs] mu_obs;
  vector[n_species] mu_species;
  matrix[n_species,n_sites] mu_species_site;

  for (j in 1:n_species) {
    mu_species[j] = beta_0 + beta_species[j];
    for (k in 1:n_sites) {
      mu_species_site[j,k] = mu_species[j] + beta_species_site[j,k];
    }
  }
  for (i in 1:n_obs) {
    mu_obs[i] = mu_species_site[species[i], site[i]]
                + beta_1*x[i];
  }
}

model {
  for (i in 1:n_obs) {
    y[i] ~ normal(mu_obs[i], sigma);
  }
  beta_0 ~ normal(0.0, intercept_scale);
  for (j in 1:n_species) {
    beta_species[j] ~ normal(0.0, sigma_species);
    for (k in 1:n_sites) {
      beta_species_site[j,k] ~ normal(0.0, sigma_species_site);
    }
  }
  beta_1 ~ normal(0.0, beta_scale);
  sigma ~ cauchy(0.0, sigma_scale);
  sigma_species ~ gamma(1.0, 1.0); //cauchy(0.0, sigma_scale);
  sigma_species_site ~ gamma(1.0, 1.0); //cauchy(0.0, sigma_scale);
}

To check my understanding, I compared the results I obtained from running this code with those I obtained from running stan_lmer() on the same data:

  fit.lmer <- stan_lmer(y ~ x + (1|species/site),
                        data=dat,
                        prior=normal(0.0, beta_scale),
                        prior_intercept=normal(0.0, intercept_scale),
                        prior_aux=cauchy(0.0, sigma_scale),
                        prior_covariance=decov(regularization=regularize,
                                               concentration=1.0,
                                               shape=1.0,
                                               scale=1.0),
                        adapt_delta=0.99)

Estimates for the intercept, regression coefficient (slope), and residual standard deviation match pretty closely. The estimates for the random effect standard deviations and the corresponding random effects differ quite a bit.

In the simulations I’ve done (code in Github at https://kholsinger.github.io/mixed-models/) the number of species is set to 5 and the number of sites is set to 3. I’m not surprised that the estimates of the random effect standard deviations would be well off the mark. I was surprised at the differences between my implementation in STan and the results from stan_lmer(). The difference in results must be because of a difference in the prior on the random effects, but as you’ll see if you look at the code in Github, I’ve tried several different prior options, and I can’t get them to match.

For my purposes, I’m only interested in the regression coefficients, but I’d appreciate any insight on the difference between the priors as I’ve implemented them in Stan and as they’re implemented in stan_lmer(). As I build up the more complex model I’m working on, I need to get the random effects right since the number of units at each level is so small.

I’m not good with the lmer formula stuff, but if you want to go step by step through something like this then maybe try the brms package (https://cran.r-project.org/web/packages/brms/index.html).

There’s a function make_stancode that you can use to look at the Stan code brms is using to fit your model (and similarly make_standata that you can use to see how it’s shuffling the data from your dataframes around).

Might be easier might not. Hope that helps! (I didn’t see any weirdnesses in your model – tnx for making it so clean, but I could eaaasily be missing something)

You need real<lower=0> ... for these standard deviations.

Thanks, but I have real<lower=0> in my code. It must not have come through when I pasted it in. I’m attaching the Stan code and the R code this time.

Kentsimulate.R (9.0 KB)
simulate.stan (1.4 KB)

I’m not sure how it translates to stan_lmer, but you want to write these with vectorization for more efficiency.

y ~ normal(mu_obs, sigma);
...
beta_species[j] ~ normal(0, sigma_species);
to_vector(beta_species_site) ~ normal(0, sigma_species_Site);

the more important change is that you want a non-centered parameterization (see the manual or many list discussions).

Bob,

Thank you for help on the vectorization. My linear algebra is rusty, so I struggle with it, and I didn’t want to mess with it until I was sure I had the logic right.

As for stan_lmer(), I’ve decided to go with the code the way I have it. The simulations I’ve done (R and Stan code in the Github repository linked to earlier) show a lower bias, a lower root mean squared error, and marginally better coverage properties (i.e., closer to the nominal 80% used to construct the symmetric credible intervals) than those in stan_lmer(). Someday, I’d like to sort out the difference, but for now, it’s enough that things are working.

Oh, and on the non-centering, it’s strange that I thought to do it on the hierarchical components and not on the lowest level one. I can’t explain that.

Kent