Hierarchical random intercept model - Stan vs brms

ice_cooler · May 2, 2025, 9:29pm

I am fitting a hierarchical random intercept model in Stan and brms but getting different results. Perhaps I’ve incorrectly specified the brms model, but I can’t see the problem.

Data: Y_{ij} \mid \mu, a_i, \sigma \stackrel{indep.}{\sim} N(\mu, \sigma), i = 1,2,\ldots,20; j = 1,2.

Priors:
\mu \sim N(0, 30)
\sigma \sim t_{4}(0, 1)
a_1, \ldots, a_{20} \stackrel{i.i.d}{\sim} N(0, \sigma_a)
\sigma_a \sim t_{4}(0, 1)

Here is reproducible code for the Stan version of the model:

library(rstan)
library(brms)
# data
y <- c(108, 98, 91, 94, 93, 96, 104, 99, 99, 97, 95, 98, 93, 97, 99,
96, 90, 100, 92, 95, 101, 89, 97, 97, 97, 100, 96, 95, 106, 100, 100,
98, 90, 99, 88, 98, 92, 92, 100, 101)
subject <- rep(1:20, each = 2)

stan_code <- "
data {
  vector[40] y;  // response values
}
parameters {
  real mu;      // common mean
  real<lower=0> sigma_a; // sd of between-subject random effects
  real<lower=0> sigma;  // sd of within-subject random effects
  vector[20] a;  // random effect for mean
}
model {
  // prior distributions
  mu ~ normal(0, 30);
  sigma_a ~ student_t(4, 0, 1);
  sigma ~ student_t(4,0, 1);
  // distribution of random effects for mean
  for(i in 1:20) a[i] ~ normal(0, sigma_a);
  // data distribution
  for(i in 1:20) {
    y[(i - 1) * 2 + 1] ~ normal(mu + a[i], sigma);
    y[(i - 1) * 2 + 2] ~ normal(mu + a[i], sigma);
  }
}
"
stan_mod <- stan(model_code = stan_code, data = list(y = y),
                 chains = 4, iter = 10000, refresh = 0, seed = 23)
stan_sum <- summary(stan_mod, prob = c(0.025, 0.975))
round(stan_sum$summary[1:3,c(1, 3:5)], 3)

The posterior results are

	mean	sd	2.5%	97.5%
\mu	96.693	0.723	95.254	98.105
\sigma_a	1.174	0.730	0.181	2.851
\sigma	4.111	0.530	3.176	5.276

Here is my attempt at fitting the same model using brms:

bdf <- data.frame(y = y, subject = as.factor(subject))
bmod <- brm(y ~ 1 + (1 | subject), data = bdat, family = gaussian(),
            prior = c(set_prior("normal(0, sqrt(1000))", class = "Intercept"),
                      set_prior("student_t(4, 0, 1)", class = "sd"),
                      set_prior("student_t(4, 0, 1)", class = "sd",
                                group = "subject")),
            chains = 4, iter = 10000, refresh = 0, seed = 23)
b_sum <- summary(bmod, prob = 0.95)
round(rbind(b_sum$fixed[1:4], b_sum$random$subject[1:4], b_sum$spec_pars[1:4]), 3)

This produces the following posterior results, which are similar, but not close enough to be the same.

	Estimate	Est.Error	l-95% CI	u-95% CI
Intercept	96.695	0.736	95.227	98.142
sd(Intercept)	0.928	0.703	0.035	2.586
sigma	4.344	0.541	3.395	5.512

I looked at the Stan code used by brms, and there is one line that seems to be causing a problem:

stancode(bmod)

The problematic line is

lprior += student_t_lpdf(sigma | 3, 0, 4.4) - 1 * student_t_lccdf(0 | 3, 0, 4.4);

It seems that brms is using t_3(0, 4.4) prior for \sigma instead of t_4(0, 3), as is described in the Stan model and in the brms code.

Any insight into how to produce the Stan results using the brms package is appreciated.

Bob_Carpenter · May 5, 2025, 4:24pm

By “brms code”, do you mean the Stan code that brms generates? Because the Stan code you cited has this:

and the brms code has this:

Are you sure that brms isn’t doing some kind of preconditioning of the covariates by default? On the other hand, I don’t see any covariates here!

For your Stan model, you can rescale mu to make initialization and sampling more standard normal,

real<scale=sqrt(1000)> mu;

mu ~ normal(0, sqrt(1000))

If you know that mu is going to have a value around 100, then you can also include offset=100, which will also help with sampling and initialization.

ice_cooler · May 5, 2025, 5:21pm

Hi Bob,

Thank you for responding. To clarify my previous post (which I can’t seem to edit now), the Stan code that the brms package generates seems to be using a t_3(0,4.4) prior for \sigma instead of the t_4(0,3) prior that is described in the brms model specification.

I did try the rescaling trick you mentioned (I think you meant multiplier instead of scale?), with similar results.

mitzimorris · May 5, 2025, 6:23pm

I think the problem is that you need to specify your priors as belonging to class “sigma”, not “sd”.

This specification:

bmod <- brm(y ~ 1 + (1 | subject), data = bdf, family = gaussian(),
            prior = c(set_prior("normal(0, sqrt(1000))", class = "Intercept"),
                      set_prior("student_t(4, 0, 1)", class = "sigma"),
                      set_prior("student_t(4, 0, 1)", class = "sd", group = "subject")),
            chains = 4, iter = 10000, refresh = 0, seed = 23)

Produces an estimate somewhat closer to Stan’s.

> bmod
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ 1 + (1 | subject) 
   Data: bdf (Number of observations: 40) 
  Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
         total post-warmup draws = 20000

Multilevel Hyperparameters:
~subject (Number of levels: 20) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.03      0.76     0.04     2.78 1.00     6539     9430

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    96.71      0.71    95.31    98.13 1.00    19433    12867

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     4.13      0.52     3.22     5.24 1.00    11793     9079

To check the priors, use function prior_summary(bmod)

prior_summary(bmod)
                 prior     class      coef   group resp dpar nlpar lb ub       source
 normal(0, sqrt(1000)) Intercept                                                 user
  student_t(3, 0, 4.4)        sd                                    0         default
    student_t(4, 0, 1)        sd           subject                  0            user
    student_t(4, 0, 1)        sd Intercept subject                  0    (vectorized)
    student_t(4, 0, 1)     sigma                                    0            user

ice_cooler · May 6, 2025, 7:51pm

Thank you, @mitzimorris. That is the main issue.

The brms package also seems to use a slightly different parameterization for a, which leads to slightly different results, suprisingly.

Instead of doing this in Stan:

a ~ normal(0, sigma_a);

the brms package does something (in psuedocode) like:

z ~ normal(0, 1);
a = sigma_a * z;

The different parameterization seems to have a slight impact on the posterior for sigma_a.

I suspect the standardized approach used by brms is supposed to be numerically superior.

mitzimorris · May 6, 2025, 10:49pm

Yes, BRMS is doing the non-centered parameterization. Unless you have a large amount of data, this generally makes it easier for the sampler to navigate the posterior.
See the Stan User’s Guide section on Efficiency Tuning: Efficiency Tuning

Topic		Replies	Views
Reproducing brms results in rstan yields different estimates brms	7	806	July 30, 2020
Questions about parameterization of random intercepts and slopes model with group-level predictor (rstan vs. brms) Modeling specification	2	1186	February 19, 2019
Why do Stan model and rstanarm/brms model with same priors have different posterior? Modeling cmdstanr	2	546	January 14, 2023
Stan_gamm4 and brms unexplained difference in intercept estimates brms specification	5	567	June 2, 2020
Estimates from hierarchical longitudinal analysis in stan much less certain than estimates from same model in brms Modeling	9	1106	September 24, 2020

Hierarchical random intercept model - Stan vs brms

Related topics