Hierarchical random intercept model - Stan vs brms

I am fitting a hierarchical random intercept model in Stan and brms but getting different results. Perhaps I’ve incorrectly specified the brms model, but I can’t see the problem.

Data: Y_{ij} \mid \mu, a_i, \sigma \stackrel{indep.}{\sim} N(\mu, \sigma), i = 1,2,\ldots,20; j = 1,2.

Priors:
\mu \sim N(0, 30)
\sigma \sim t_{4}(0, 1)
a_1, \ldots, a_{20} \stackrel{i.i.d}{\sim} N(0, \sigma_a)
\sigma_a \sim t_{4}(0, 1)

Here is reproducible code for the Stan version of the model:

library(rstan)
library(brms)
# data
y <- c(108, 98, 91, 94, 93, 96, 104, 99, 99, 97, 95, 98, 93, 97, 99,
96, 90, 100, 92, 95, 101, 89, 97, 97, 97, 100, 96, 95, 106, 100, 100,
98, 90, 99, 88, 98, 92, 92, 100, 101)
subject <- rep(1:20, each = 2)

stan_code <- "
data {
  vector[40] y;  // response values
}
parameters {
  real mu;      // common mean
  real<lower=0> sigma_a; // sd of between-subject random effects
  real<lower=0> sigma;  // sd of within-subject random effects
  vector[20] a;  // random effect for mean
}
model {
  // prior distributions
  mu ~ normal(0, 30);
  sigma_a ~ student_t(4, 0, 1);
  sigma ~ student_t(4,0, 1);
  // distribution of random effects for mean
  for(i in 1:20) a[i] ~ normal(0, sigma_a);
  // data distribution
  for(i in 1:20) {
    y[(i - 1) * 2 + 1] ~ normal(mu + a[i], sigma);
    y[(i - 1) * 2 + 2] ~ normal(mu + a[i], sigma);
  }
}
"
stan_mod <- stan(model_code = stan_code, data = list(y = y),
                 chains = 4, iter = 10000, refresh = 0, seed = 23)
stan_sum <- summary(stan_mod, prob = c(0.025, 0.975))
round(stan_sum$summary[1:3,c(1, 3:5)], 3)

The posterior results are

mean sd 2.5% 97.5%
\mu 96.693 0.723 95.254 98.105
\sigma_a 1.174 0.730 0.181 2.851
\sigma 4.111 0.530 3.176 5.276

Here is my attempt at fitting the same model using brms:

bdf <- data.frame(y = y, subject = as.factor(subject))
bmod <- brm(y ~ 1 + (1 | subject), data = bdat, family = gaussian(),
            prior = c(set_prior("normal(0, sqrt(1000))", class = "Intercept"),
                      set_prior("student_t(4, 0, 1)", class = "sd"),
                      set_prior("student_t(4, 0, 1)", class = "sd",
                                group = "subject")),
            chains = 4, iter = 10000, refresh = 0, seed = 23)
b_sum <- summary(bmod, prob = 0.95)
round(rbind(b_sum$fixed[1:4], b_sum$random$subject[1:4], b_sum$spec_pars[1:4]), 3)

This produces the following posterior results, which are similar, but not close enough to be the same.

Estimate Est.Error l-95% CI u-95% CI
Intercept 96.695 0.736 95.227 98.142
sd(Intercept) 0.928 0.703 0.035 2.586
sigma 4.344 0.541 3.395 5.512

I looked at the Stan code used by brms, and there is one line that seems to be causing a problem:

stancode(bmod)

The problematic line is

lprior += student_t_lpdf(sigma | 3, 0, 4.4) - 1 * student_t_lccdf(0 | 3, 0, 4.4);

It seems that brms is using t_3(0, 4.4) prior for \sigma instead of t_4(0, 3), as is described in the Stan model and in the brms code.

Any insight into how to produce the Stan results using the brms package is appreciated.

By “brms code”, do you mean the Stan code that brms generates? Because the Stan code you cited has this:

and the brms code has this:

Are you sure that brms isn’t doing some kind of preconditioning of the covariates by default? On the other hand, I don’t see any covariates here!

For your Stan model, you can rescale mu to make initialization and sampling more standard normal,

real<scale=sqrt(1000)> mu;

mu ~ normal(0, sqrt(1000))

If you know that mu is going to have a value around 100, then you can also include offset=100, which will also help with sampling and initialization.

Hi Bob,

Thank you for responding. To clarify my previous post (which I can’t seem to edit now), the Stan code that the brms package generates seems to be using a t_3(0,4.4) prior for \sigma instead of the t_4(0,3) prior that is described in the brms model specification.

I did try the rescaling trick you mentioned (I think you meant multiplier instead of scale?), with similar results.

I think the problem is that you need to specify your priors as belonging to class “sigma”, not “sd”.

This specification:

bmod <- brm(y ~ 1 + (1 | subject), data = bdf, family = gaussian(),
            prior = c(set_prior("normal(0, sqrt(1000))", class = "Intercept"),
                      set_prior("student_t(4, 0, 1)", class = "sigma"),
                      set_prior("student_t(4, 0, 1)", class = "sd", group = "subject")),
            chains = 4, iter = 10000, refresh = 0, seed = 23)

Produces an estimate somewhat closer to Stan’s.

> bmod
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ 1 + (1 | subject) 
   Data: bdf (Number of observations: 40) 
  Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
         total post-warmup draws = 20000

Multilevel Hyperparameters:
~subject (Number of levels: 20) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.03      0.76     0.04     2.78 1.00     6539     9430

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    96.71      0.71    95.31    98.13 1.00    19433    12867

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     4.13      0.52     3.22     5.24 1.00    11793     9079

To check the priors, use function prior_summary(bmod)

prior_summary(bmod)
                 prior     class      coef   group resp dpar nlpar lb ub       source
 normal(0, sqrt(1000)) Intercept                                                 user
  student_t(3, 0, 4.4)        sd                                    0         default
    student_t(4, 0, 1)        sd           subject                  0            user
    student_t(4, 0, 1)        sd Intercept subject                  0    (vectorized)
    student_t(4, 0, 1)     sigma                                    0            user
3 Likes

Thank you, @mitzimorris. That is the main issue.

The brms package also seems to use a slightly different parameterization for a, which leads to slightly different results, suprisingly.

Instead of doing this in Stan:

a ~ normal(0, sigma_a);

the brms package does something (in psuedocode) like:

z ~ normal(0, 1);
a = sigma_a * z;

The different parameterization seems to have a slight impact on the posterior for sigma_a.

I suspect the standardized approach used by brms is supposed to be numerically superior.

Yes, BRMS is doing the non-centered parameterization. Unless you have a large amount of data, this generally makes it easier for the sampler to navigate the posterior.
See the Stan User’s Guide section on Efficiency Tuning: Efficiency Tuning

1 Like