Bayesian regression model equation formulation

I am wondering what is a correct/good way to specify the equations for a linear Bayesian regression model with both population- and group-level effects, fitted in brms, defining all relevant parameters that are estimated (and reported in summary).

The model is basically fitted by:

brm(formula = y ~ b1 + b2 + b3 + (1|subject), family = 'lognormal')

Here is how I would define it in equation form:

y_i \sim \mathrm{Lognormal} (\mu_i, \sigma_\mathrm{dist})\\ \mu_i = \mathbf{X}_i \mathbf{\beta} + \mathbf{Z}_i \mathbf{u}\\ \mathbf{u} \sim \mathcal{N} (\mathbf{0}, \sigma_\mathrm{subject}^2\mathbf{I})

where \beta = [\beta_0, \beta_1, \beta_2, \beta_3]^T are the population-level parameters, and \mathbf{u} the group-level parameters that are sampled from a multivariate, zero-centered, normal distribution with covariance \sigma_\mathrm{subject}^2\mathbf{I}.

Is this a correct/good definition? Specifically, does the \sigma_\mathrm{subject} correspond to the group-level effect estimate that is reported in the brms summary output as sd(Intercept)?

  • Operating System: Ubuntu 16.04
  • brms Version: 2.6.0

Looks good to me.

Yeah. There are two intercept terms there. One corresponding to the population intercept and then the group level intercepts. sd(Intercept) is the one corresponding to the standard deviation of the group level intercepts.

This doesn’t really help with the sd(Intercept) question, but when I have questions with brms models I like to just generate the code and look at what is happening.

So something like:

library(tidyverse)
library(brms)

data = tibble(subject = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
       y = exp(rnorm(9)),
       b1 = rnorm(9),
       b2 = rnorm(9),
       b3 = rnorm(9))

make_stancode(y ~ b1 + b2 + b3 + (1|subject),
              data = data,
              family = 'lognormal')

The model bit with the priors taken out is:

model {
  // initialize linear predictor term
  vector[N] mu = Intercept + Xc * b;
  for (n in 1:N) {
    // add more terms to the linear predictor
    mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
  }
  // priors including all constants
  ...
  // likelihood including all constants
  if (!prior_only) {
    target += lognormal_lpdf(Y | mu, sigma);
  }
}

From that you can try to break down if the formula you gave brms is getting you the model you expect.

3 Likes

Great, thank you, also for the nice example.

I haven’t really gotten behind what this is actually doing (am not very familiar with stan yet). Looking at the complete stan code, it seems that J_1 contains which subject is associated to row n (‘grouping indicator per observation’). r_1_1 I assume is then the equivalent of the \mathbf{u} vector, but what is Z_1_1 then (‘group-level predictor values’)?

Also, checking brm_model$fit, it seems that there are some individual intercepts (r_subject[1,Intercept] etc.):

                         mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
b_Intercept             -0.07    0.13 2.33  -5.15  -1.15  -0.09   1.00   5.04   313 1.01
b_b1                    -0.07    0.03 0.86  -1.99  -0.50  -0.07   0.39   1.62  1116 1.00
b_b2                     0.66    0.05 1.97  -3.56  -0.34   0.68   1.74   4.61  1316 1.00
b_b3                     0.78    0.04 1.59  -2.57  -0.04   0.78   1.68   3.73  1292 1.00
sd_subject__Intercept    3.25    0.19 3.19   0.11   1.07   2.24   4.33  11.30   289 1.01
sigma                    2.45    0.05 1.30   1.08   1.61   2.12   2.90   5.91   670 1.01
Intercept                0.28    0.14 2.23  -4.26  -0.69   0.25   1.21   5.17   264 1.02
r_subject[1,Intercept]  -0.76    0.16 2.67  -6.80  -1.86  -0.46   0.34   4.40   287 1.01
r_subject[2,Intercept]  -0.20    0.13 2.48  -6.30  -1.14  -0.03   0.83   4.83   344 1.01
r_subject[3,Intercept]   0.78    0.13 2.65  -5.20  -0.37   0.41   1.90   6.75   388 1.01
lp__                   -32.53    0.15 3.33 -40.15 -34.45 -32.07 -30.18 -26.97   519 1.02

Do you know how to interpret those and how they are related to sd_subject__Intercept, and should they be included in the equation?

There is also make_standata which is useful for this stuff. Z_1_1 are just constants that multiply things. That’s probably not a great description, but if you had a:

(1 + 0.5 * x | group) term, the Z__ constants would handle the 0.5.

I’m assuming these are the group level intercepts and you’re estimating sd_subject__Intercept which is the standard deviation of these group level intercepts.

1 Like

Thanks for the further clarification. Sounds reasonable for Z_1_1.

That sounds reasonable as well. But those do not need to appear in the equations or? Since it is only the “aggregated” sd of the group-level effects (i.e. of the normal distribution, \sigma_\mathrm{subject}) that is being estimated?

Sorry could you expand on this?

Should the group-level intercepts (r_subject[1,Intercept], r_subject[1,Intercept], etc.) appear somehow in the model equation (initial post)? I would say no, as to me they seem to be more of a “byproduct” of the estimation. So what is being estimated in the end (for the model) is the covariance of the multivariate, zero-centered, normal distribution due to the group-level effects, which is set to have variance \sigma_\mathrm{subject}^2 on all diagonal elements. Is that right?

They’re in this term:

(1|subject)

What that means is create an intercept term for every subject group.

Whether they are a byproduct or not, you still need to account for them someway or another, be that sampling them and ignoring them, integrating them out, or some other thing.

I think lme4 might integrate them out? There might be an extra assumption they make, I’m not sure. We sample them though.

Not sure if I was clear before, but what I was after was if there is anything else to add to the (mathematical) equation, due to the individual group-level intercept contributions, or if the “common” (across all individuals, see below) normal distribution is enough when specifying the model?