Bayesian regression model equation formulation

I am wondering what is a correct/good way to specify the equations for a linear Bayesian regression model with both population- and group-level effects, fitted in brms, defining all relevant parameters that are estimated (and reported in summary).

The model is basically fitted by:

brm(formula = y ~ b1 + b2 + b3 + (1|subject), family = 'lognormal')

Here is how I would define it in equation form:

y_i \sim \mathrm{Lognormal} (\mu_i, \sigma_\mathrm{dist})\\ \mu_i = \mathbf{X}_i \mathbf{\beta} + \mathbf{Z}_i \mathbf{u}\\ \mathbf{u} \sim \mathcal{N} (\mathbf{0}, \sigma_\mathrm{subject}^2\mathbf{I})

where \beta = [\beta_0, \beta_1, \beta_2, \beta_3]^T are the population-level parameters, and \mathbf{u} the group-level parameters that are sampled from a multivariate, zero-centered, normal distribution with covariance \sigma_\mathrm{subject}^2\mathbf{I}.

Is this a correct/good definition? Specifically, does the \sigma_\mathrm{subject} correspond to the group-level effect estimate that is reported in the brms summary output as sd(Intercept)?

  • Operating System: Ubuntu 16.04
  • brms Version: 2.6.0
1 Like

Looks good to me.

Yeah. There are two intercept terms there. One corresponding to the population intercept and then the group level intercepts. sd(Intercept) is the one corresponding to the standard deviation of the group level intercepts.

This doesn’t really help with the sd(Intercept) question, but when I have questions with brms models I like to just generate the code and look at what is happening.

So something like:

library(tidyverse)
library(brms)

data = tibble(subject = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
       y = exp(rnorm(9)),
       b1 = rnorm(9),
       b2 = rnorm(9),
       b3 = rnorm(9))

make_stancode(y ~ b1 + b2 + b3 + (1|subject),
              data = data,
              family = 'lognormal')

The model bit with the priors taken out is:

model {
  // initialize linear predictor term
  vector[N] mu = Intercept + Xc * b;
  for (n in 1:N) {
    // add more terms to the linear predictor
    mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
  }
  // priors including all constants
  ...
  // likelihood including all constants
  if (!prior_only) {
    target += lognormal_lpdf(Y | mu, sigma);
  }
}

From that you can try to break down if the formula you gave brms is getting you the model you expect.

5 Likes

Great, thank you, also for the nice example.

I haven’t really gotten behind what this is actually doing (am not very familiar with stan yet). Looking at the complete stan code, it seems that J_1 contains which subject is associated to row n (‘grouping indicator per observation’). r_1_1 I assume is then the equivalent of the \mathbf{u} vector, but what is Z_1_1 then (‘group-level predictor values’)?

Also, checking brm_model$fit, it seems that there are some individual intercepts (r_subject[1,Intercept] etc.):

                         mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
b_Intercept             -0.07    0.13 2.33  -5.15  -1.15  -0.09   1.00   5.04   313 1.01
b_b1                    -0.07    0.03 0.86  -1.99  -0.50  -0.07   0.39   1.62  1116 1.00
b_b2                     0.66    0.05 1.97  -3.56  -0.34   0.68   1.74   4.61  1316 1.00
b_b3                     0.78    0.04 1.59  -2.57  -0.04   0.78   1.68   3.73  1292 1.00
sd_subject__Intercept    3.25    0.19 3.19   0.11   1.07   2.24   4.33  11.30   289 1.01
sigma                    2.45    0.05 1.30   1.08   1.61   2.12   2.90   5.91   670 1.01
Intercept                0.28    0.14 2.23  -4.26  -0.69   0.25   1.21   5.17   264 1.02
r_subject[1,Intercept]  -0.76    0.16 2.67  -6.80  -1.86  -0.46   0.34   4.40   287 1.01
r_subject[2,Intercept]  -0.20    0.13 2.48  -6.30  -1.14  -0.03   0.83   4.83   344 1.01
r_subject[3,Intercept]   0.78    0.13 2.65  -5.20  -0.37   0.41   1.90   6.75   388 1.01
lp__                   -32.53    0.15 3.33 -40.15 -34.45 -32.07 -30.18 -26.97   519 1.02

Do you know how to interpret those and how they are related to sd_subject__Intercept, and should they be included in the equation?

1 Like

There is also make_standata which is useful for this stuff. Z_1_1 are just constants that multiply things. That’s probably not a great description, but if you had a:

(1 + 0.5 * x | group) term, the Z__ constants would handle the 0.5.

I’m assuming these are the group level intercepts and you’re estimating sd_subject__Intercept which is the standard deviation of these group level intercepts.

1 Like

Thanks for the further clarification. Sounds reasonable for Z_1_1.

That sounds reasonable as well. But those do not need to appear in the equations or? Since it is only the “aggregated” sd of the group-level effects (i.e. of the normal distribution, \sigma_\mathrm{subject}) that is being estimated?

Sorry could you expand on this?

Should the group-level intercepts (r_subject[1,Intercept], r_subject[1,Intercept], etc.) appear somehow in the model equation (initial post)? I would say no, as to me they seem to be more of a “byproduct” of the estimation. So what is being estimated in the end (for the model) is the covariance of the multivariate, zero-centered, normal distribution due to the group-level effects, which is set to have variance \sigma_\mathrm{subject}^2 on all diagonal elements. Is that right?

They’re in this term:

(1|subject)

What that means is create an intercept term for every subject group.

Whether they are a byproduct or not, you still need to account for them someway or another, be that sampling them and ignoring them, integrating them out, or some other thing.

I think lme4 might integrate them out? There might be an extra assumption they make, I’m not sure. We sample them though.

Not sure if I was clear before, but what I was after was if there is anything else to add to the (mathematical) equation, due to the individual group-level intercept contributions, or if the “common” (across all individuals, see below) normal distribution is enough when specifying the model?

Oooops sorry I never posted my reply:

u there is a vector. So that notation expands to something like:

u_1 \sim N(0, \sigma^2_\text{subject})\\ u_2 \sim N(0, \sigma^2_\text{subject})\\ u_3 \sim N(0, \sigma^2_\text{subject})\\ ...

(however many different subjects you have)

In the original notation, I think we encoded an intercept in the X (a column of 1s).

I don’t know what you mean by

If you put a proper prior on all the parameters, I think that’s all you technically need. Depending on the data you might have identifiability problems.

Certainly what is written there is a very standard way to parameterize group level terms.

Thanks for confirming. I will leave the equation as it was stated in the original post then.

I assume that the column of 1s is in in the X matrix (i.e. first column, the others are the population-level effects for b_1, b_2, b_3).

Can you elaborate on this?

Here’s an example of how correlated data can cause identifiability problems in a model. This first model should run fine:

library(brms)
library(tidyverse)

N = 10
x1 = rnorm(N, 0, 1)
x2 = rnorm(N, 0, 1)

y = rnorm(length(x1), x1 + x2, 0.1)

fit = brm(y ~ x1 + x2 - 1,
          data = tibble(y = y, x1 = x1, x2 = x2),
          prior = set_prior("normal(0, 1)", class = "b") +
            set_prior("normal(0, 1)", class = "sigma"))

pairs(fit)

Here there’s a non-identifiability between the coefficient of x1 and x2.

library(brms)
library(tidyverse)

N = 10
x1 = rnorm(N, 0, 1)
x2 = rnorm(length(x1), x1, 0.1)

y = rnorm(length(x1), x1 + x2, 0.1)

fit = brm(y ~ x1 + x2 - 1,
          data = tibble(y = y, x1 = x1, x2 = x2),
          prior = set_prior("normal(0, 1)", class = "b") +
            set_prior("normal(0, 1)", class = "sigma"))

pairs(fit)

Back to this real quick:

For what it’s worth, there’s no process by which we formally bless certain models as being good or not. You’ll make that decision based on fake data simulations of your problem, posterior predictive checks with the real data, and whatever considerations come up in whatever domain you’re working in.

The thing of note here is that even though the model you gave originally might make sense in one context with some data, it might make no sense in another situation with different data.

So the model you wrote in the first post looks fine, insomuch as it is a model. Beyond that it’s hard to do anything other than guess (which is where the identifiability thing came from :D).

2 Likes