Bayesian regression model equation formulation

tester · February 13, 2020, 7:09pm

I am wondering what is a correct/good way to specify the equations for a linear Bayesian regression model with both population- and group-level effects, fitted in brms, defining all relevant parameters that are estimated (and reported in summary).

The model is basically fitted by:

brm(formula = y ~ b1 + b2 + b3 + (1|subject), family = 'lognormal')

Here is how I would define it in equation form:

y_i \sim \mathrm{Lognormal} (\mu_i, \sigma_\mathrm{dist})\\ \mu_i = \mathbf{X}_i \mathbf{\beta} + \mathbf{Z}_i \mathbf{u}\\ \mathbf{u} \sim \mathcal{N} (\mathbf{0}, \sigma_\mathrm{subject}^2\mathbf{I})

where \beta = [\beta_0, \beta_1, \beta_2, \beta_3]^T are the population-level parameters, and \mathbf{u} the group-level parameters that are sampled from a multivariate, zero-centered, normal distribution with covariance \sigma_\mathrm{subject}^2\mathbf{I}.

Is this a correct/good definition? Specifically, does the \sigma_\mathrm{subject} correspond to the group-level effect estimate that is reported in the brms summary output as sd(Intercept)?

Operating System: Ubuntu 16.04
brms Version: 2.6.0

bbbales2 · February 14, 2020, 5:57pm

Looks good to me.

Yeah. There are two intercept terms there. One corresponding to the population intercept and then the group level intercepts. sd(Intercept) is the one corresponding to the standard deviation of the group level intercepts.

This doesn’t really help with the sd(Intercept) question, but when I have questions with brms models I like to just generate the code and look at what is happening.

So something like:

library(tidyverse)
library(brms)

data = tibble(subject = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
       y = exp(rnorm(9)),
       b1 = rnorm(9),
       b2 = rnorm(9),
       b3 = rnorm(9))

make_stancode(y ~ b1 + b2 + b3 + (1|subject),
              data = data,
              family = 'lognormal')

The model bit with the priors taken out is:

model {
  // initialize linear predictor term
  vector[N] mu = Intercept + Xc * b;
  for (n in 1:N) {
    // add more terms to the linear predictor
    mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
  }
  // priors including all constants
  ...
  // likelihood including all constants
  if (!prior_only) {
    target += lognormal_lpdf(Y | mu, sigma);
  }
}

From that you can try to break down if the formula you gave brms is getting you the model you expect.

tester · February 18, 2020, 11:56am

Great, thank you, also for the nice example.

I haven’t really gotten behind what this is actually doing (am not very familiar with stan yet). Looking at the complete stan code, it seems that J_1 contains which subject is associated to row n (‘grouping indicator per observation’). r_1_1 I assume is then the equivalent of the \mathbf{u} vector, but what is Z_1_1 then (‘group-level predictor values’)?

Also, checking brm_model$fit, it seems that there are some individual intercepts (r_subject[1,Intercept] etc.):

                         mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
b_Intercept             -0.07    0.13 2.33  -5.15  -1.15  -0.09   1.00   5.04   313 1.01
b_b1                    -0.07    0.03 0.86  -1.99  -0.50  -0.07   0.39   1.62  1116 1.00
b_b2                     0.66    0.05 1.97  -3.56  -0.34   0.68   1.74   4.61  1316 1.00
b_b3                     0.78    0.04 1.59  -2.57  -0.04   0.78   1.68   3.73  1292 1.00
sd_subject__Intercept    3.25    0.19 3.19   0.11   1.07   2.24   4.33  11.30   289 1.01
sigma                    2.45    0.05 1.30   1.08   1.61   2.12   2.90   5.91   670 1.01
Intercept                0.28    0.14 2.23  -4.26  -0.69   0.25   1.21   5.17   264 1.02
r_subject[1,Intercept]  -0.76    0.16 2.67  -6.80  -1.86  -0.46   0.34   4.40   287 1.01
r_subject[2,Intercept]  -0.20    0.13 2.48  -6.30  -1.14  -0.03   0.83   4.83   344 1.01
r_subject[3,Intercept]   0.78    0.13 2.65  -5.20  -0.37   0.41   1.90   6.75   388 1.01
lp__                   -32.53    0.15 3.33 -40.15 -34.45 -32.07 -30.18 -26.97   519 1.02

Do you know how to interpret those and how they are related to sd_subject__Intercept, and should they be included in the equation?

bbbales2 · February 18, 2020, 3:38pm

There is also make_standata which is useful for this stuff. Z_1_1 are just constants that multiply things. That’s probably not a great description, but if you had a:

(1 + 0.5 * x | group) term, the Z__ constants would handle the 0.5.

I’m assuming these are the group level intercepts and you’re estimating sd_subject__Intercept which is the standard deviation of these group level intercepts.

tester · February 19, 2020, 9:40am

Thanks for the further clarification. Sounds reasonable for Z_1_1.

That sounds reasonable as well. But those do not need to appear in the equations or? Since it is only the “aggregated” sd of the group-level effects (i.e. of the normal distribution, \sigma_\mathrm{subject}) that is being estimated?

bbbales2 · February 19, 2020, 11:31pm

Sorry could you expand on this?

tester · February 20, 2020, 9:45am

Should the group-level intercepts (r_subject[1,Intercept], r_subject[1,Intercept], etc.) appear somehow in the model equation (initial post)? I would say no, as to me they seem to be more of a “byproduct” of the estimation. So what is being estimated in the end (for the model) is the covariance of the multivariate, zero-centered, normal distribution due to the group-level effects, which is set to have variance \sigma_\mathrm{subject}^2 on all diagonal elements. Is that right?

bbbales2 · February 20, 2020, 3:05pm

They’re in this term:

(1|subject)

What that means is create an intercept term for every subject group.

Whether they are a byproduct or not, you still need to account for them someway or another, be that sampling them and ignoring them, integrating them out, or some other thing.

I think lme4 might integrate them out? There might be an extra assumption they make, I’m not sure. We sample them though.

tester · February 21, 2020, 2:37pm

Not sure if I was clear before, but what I was after was if there is anything else to add to the (mathematical) equation, due to the individual group-level intercept contributions, or if the “common” (across all individuals, see below) normal distribution is enough when specifying the model?

bbbales2 · February 24, 2020, 5:55pm

Oooops sorry I never posted my reply:

u there is a vector. So that notation expands to something like:

u_1 \sim N(0, \sigma^2_\text{subject})\\ u_2 \sim N(0, \sigma^2_\text{subject})\\ u_3 \sim N(0, \sigma^2_\text{subject})\\ ...

(however many different subjects you have)

In the original notation, I think we encoded an intercept in the X (a column of 1s).

I don’t know what you mean by

If you put a proper prior on all the parameters, I think that’s all you technically need. Depending on the data you might have identifiability problems.

Certainly what is written there is a very standard way to parameterize group level terms.

tester · February 25, 2020, 10:14am

Thanks for confirming. I will leave the equation as it was stated in the original post then.

I assume that the column of 1s is in in the X matrix (i.e. first column, the others are the population-level effects for b_1, b_2, b_3).

Can you elaborate on this?

bbbales2 · February 25, 2020, 4:07pm

Here’s an example of how correlated data can cause identifiability problems in a model. This first model should run fine:

library(brms)
library(tidyverse)

N = 10
x1 = rnorm(N, 0, 1)
x2 = rnorm(N, 0, 1)

y = rnorm(length(x1), x1 + x2, 0.1)

fit = brm(y ~ x1 + x2 - 1,
          data = tibble(y = y, x1 = x1, x2 = x2),
          prior = set_prior("normal(0, 1)", class = "b") +
            set_prior("normal(0, 1)", class = "sigma"))

pairs(fit)

Here there’s a non-identifiability between the coefficient of x1 and x2.

library(brms)
library(tidyverse)

N = 10
x1 = rnorm(N, 0, 1)
x2 = rnorm(length(x1), x1, 0.1)

y = rnorm(length(x1), x1 + x2, 0.1)

fit = brm(y ~ x1 + x2 - 1,
          data = tibble(y = y, x1 = x1, x2 = x2),
          prior = set_prior("normal(0, 1)", class = "b") +
            set_prior("normal(0, 1)", class = "sigma"))

pairs(fit)

Back to this real quick:

For what it’s worth, there’s no process by which we formally bless certain models as being good or not. You’ll make that decision based on fake data simulations of your problem, posterior predictive checks with the real data, and whatever considerations come up in whatever domain you’re working in.

The thing of note here is that even though the model you gave originally might make sense in one context with some data, it might make no sense in another situation with different data.

So the model you wrote in the first post looks fine, insomuch as it is a model. Beyond that it’s hard to do anything other than guess (which is where the identifiability thing came from :D).

Topic		Replies	Views
Translating brms model into equation brms specification	5	1564	February 11, 2021
A group-level intercept model using a multivariate brms formula? brms	3	996	September 10, 2018
Brms custom formula for logistic regression brms logistic-regression	3	674	May 29, 2023
In brms: Two categorical predictors with unique intercepts Interfaces	2	513	March 3, 2019
Interpretation brms results brms interpret-results	17	6072	November 13, 2018

Bayesian regression model equation formulation

Related topics