Brms approach to Stan models for nested versus non-nested multi-level models?

brms underlying Stan code for a non-nested multi-level model, e.g.,

make_stancode(y ~ 1 + (1 | re_1) + (1 | re_2) + (1 | re_3), data = d)

is identical to the Stan code for a nested model,

make_stancode(y ~ 1 + (1 | re_1 / re_2 / re_3), data = d)

whether or not the nesting is natural. It looks like the difference between brms-generated Stan models are in how they index the random effects. If naturally nested, it seems each random effect assumes the indexing of the level before it. Is that right? And if not-naturally nested, how does brms approach indexing?

Here’s some code:

# Simulated data
N    <- 100
y    <- rnorm(N)

# naturally nested random effects
re_1 <- sample(LETTERS[1:5], N, replace = T)
re_2 <- paste0(re_1, sample(seq(5), N, replace = T))
re_3 <- paste0(re_2, sample(seq(5), N, replace = T))
d    <- data.frame(y, re_1, re_2, re_3)

# create stan code and data for nonnested and nested models 
# using naturally nested random effects

library(brms)
nonnested_code <- make_stancode(y ~ 1 + (1 | re_1) + (1 | re_2) + (1 | re_3), data = d)
nonnested_data <- make_standata(y ~ 1 + (1 | re_1) + (1 | re_2) + (1 | re_3), data = d)
nested_code    <- make_stancode(y ~ 1 + (1 | re_1 / re_2 / re_3), data = d)
nested_data    <- make_standata(y ~ 1 + (1 | re_1 / re_2 / re_3), data = d)

# test for equality in code and data
identical(nonnested_code, nested_code) # TRUE
identical(nonnested_data, nested_data) # TRUE

# not naturally nested random effects
re_2 <- paste0(sample(seq(5), N, replace = T))
re_3 <- paste0(sample(seq(5), N, replace = T))
d    <- data.frame(y, re_1, re_2, re_3)

# create stan code and data for nonnested and nested models 
# using naturally nested random effects

nonnested_code <- make_stancode(y ~ 1 + (1 | re_1) + (1 | re_2) + (1 | re_3), data = d)
nonnested_data <- make_standata(y ~ 1 + (1 | re_1) + (1 | re_2) + (1 | re_3), data = d)
nested_code    <- make_stancode(y ~ 1 + (1 | re_1 / re_2 / re_3), data = d)
nested_data    <- make_standata(y ~ 1 + (1 | re_1 / re_2 / re_3), data = d)

# test for equality in code and data
identical(nonnested_code, nested_code) # TRUE
identical(nonnested_data, nested_data) # FALSE

writeLines(nested_code) # look at Stan code
str(nonnested_data)
str(nested_data)
1 Like

(1 | A/B) translates to (1 | A) + (1 | A:B) where A:B simply means creating a new grouping factor with the levels of A and B pasted together. This is explained a little bit longer in https://journal.r-project.org/archive/2018/RJ-2018-017/index.html

1 Like

Thanks. That’s helpful. In looking at the paper, I do have a follow-up question on the paper description,

The / operator indicates nested grouping structures and expands one grouping factor into two or more when using multiple / within one term. For instance, if the user were to write (1 | g1/g2), brms will expand this to (1 | g1) + (1 | g1:g2). Instead of |, users may use || in grouping terms to prevent group-level correlations from being modeled.

In reviewing underlying Stan code and data from my example, I want to make sure I understand how the group-level correlations are being modeled, and when I compare:

nested_code   <- make_stancode(y ~ 1 + (1 | re_1 / re_2 / re_3), data = d)
nested_data   <- make_standata(y ~ 1 + (1 | re_1 / re_2 / re_3), data = d)

nested_codenc <- make_stancode(y ~ 1 + (1 || re_1 / re_2 / re_3), data = d)
nested_datanc <- make_standata(y ~ 1 + (1 || re_1 / re_2 / re_3), data = d)

# test for equality in code and data
identical(nested_code, nested_codenc) # TRUE
identical(nested_data, nested_datanc) # TRUE

The model (stan code and stan data) using | seems identical to the model using ||

That’s because there is nothing to correlate in your model. Replace (1 | ...) by (1 + x | ...) and you will see the difference.

Ah, that makes sense. I misunderstood the sentence as meaning correlation across groups instead of correlation of predictors within each group.

I’d like to have the group (1 | g1) provide/share information for the nested group (1 | g1:g2).

Then you need to make a conceptual proposal of how this should look like :-)

Learning Bayesian methods and Stan coding is difficult and something I don’t know a lot about.

Using and interpreting mulitlevel/mixed/hierarchical models is also hard, much harder than most people realize.

Check out general help on specifying multilevel models.

  • Ben Bolker’s FAQ
    *This oft cited answer in this stack exchange thread
    *Doug Bates article about mixed models in lme4 (he also has a lot of useful slide shows and lecture notes)
    *There is a large archive of questions in archived mailing list [r-sig-mixed-models] archive

Obviously these are not bayesian methods but the formula syntax is largely the same except where explicitly cited in the brms manual.

Nested levels should always be explicitly coded so any specification you use is at least not going to be messed up by that one step.

5 Likes

Would:

Be a start?

1 Like

I don’t see how this can help since the model you linked to does not contain any correlation or similar measure to share information between random effects of different levels. Or am I overlooking something?

As a start, it seems it may be useful to just change the priors from independent groups to using the broader group as the mean for the nested group; same for the standard deviations. So, the brms-generated Stan code in the above example,

target += student_t_lpdf(sd_1 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10);
target += normal_lpdf(z_1[1] | 0, 1);

target += student_t_lpdf(sd_2 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10);
target += normal_lpdf(z_2[1] | 0, 1);

target += student_t_lpdf(sd_3 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10);
target += normal_lpdf(z_3[1] | 0, 1);

Could be changed to,

target += student_t_lpdf(sd_1 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10);
target += normal_lpdf(z_1[1] | 0, 1);
  
target += student_t_lpdf(sd_2 | 3, sd_1, 10)
    - 1 * student_t_lccdf(0 | 3, sd_1, 10);
target += normal_lpdf(z_2[1][J_2] | z_1[1][J_1], 1);
  
target += student_t_lpdf(sd_3 | 3, sd_2, 10)
    - 1 * student_t_lccdf(0 | 3, sd_2, 10);
target += normal_lpdf(z_3[1][J_3] | z_2[1][J_2], 1);

For me to seriously consider a proposal, we cannot just change a bunch of lines of Stan code and hope for the model to be still reasonable. Instead it is important to have a clear theory of what is the target, how the desired model looks like, what properties it has and why those makes sense.

With respect to what you are writing above it seems that, at least party, you want to use a centered parameterization (of the random effects), while brms uses the non-centered parameterization throughout. The latter has the advanatage of also applying to non-nested models and to usually having better sampling properties. Apart from that, if both parameterizations exist, they are equivalent.

Thank you for your explanations, I understand more fully what is intended with the brms-generated code. I didn’t mean to imply I’d like you to rework what brms does for me. It’s a really helpful tool, and like that I can even use its code to pick up where it left off on the occasions when I’d like a bit different expression.

I just included the code to show one way I think the information could be minimally shared between the groups, independent of whether a centered parameterization was used.

1 Like