Understanding sample_new_levels = "uncertainty", "gaussian", and "old_levels"

The documentation for brms::extract_draws() says this about the sample_new_levels option:

Indicates how to sample new levels for grouping factors specified in re_formula. This argument is only relevant if newdata is provided and allow_new_levels is set to TRUE. If "uncertainty" (default), include group-level uncertainty in the predictions based on the variation of the existing levels. If "gaussian", sample new levels from the (multivariate) normal distribution implied by the group-level standard deviations and correlations. This options may be useful for conducting Bayesian power analysis. If "old_levels", directly sample new levels from the existing levels.

I’m having a hard time understanding what the differences between the three options (“uncertainty”, “gaussian”, and “old_levels”) are.

How does “uncertainty” include group-level uncertainty based on the variation of the existing levels? Does it randomly sample from the existing group levels?

For “gaussian”, am I correct that for a term like (1 | grp), this option would return something like rnorm(n_draws, 0, sd_grp__Intercept)?

For “old_levels”, is one of the group levels chosen randomly at the start, and then only its draws are used?


I’ve read this discussion on the brms github and I’ve taken a look at the code for extract_draws.R but I’m still unsure on this.

3 Likes

Your understanding of “gaussian” and “old_levels” is correct.

“uncertainty” can be thought of fullfilling a similar purpose as “gaussian” but instead of using a gaussian distribution to sample from (which may or may not be a good representation of the group-level effects distribution), we sample from the distribution directly implied by the old levels in a bootstrap kind of fashion.

Hi Paul,

Could you clarify what draws are being bootstrapped? I can’t tell from the source code. For example, what would the bootstrapping look like for a simple Gaussian model like y ~ (1 | grp) when I call posterior_linpred(fit, newdata = data.frame(grp = "new_group"), allow_new_levels = TRUE)?

Perhaps “bootstrapping” is not the right term. Here is what I mean. Suppose grp has 3 levels, call them a, b, and c and suppose we have 5 posterior draws for these levels:

a: 1 2 3 4 5
b: 6 7 8 9 10
c: 11 12 13 14 15

Then, in “uncertainty”, we sample as follows. For the first posterior draw, we choose from (1, 6, 11), in the second we choose from (2, 7, 12), in the third we choose from (3, 8, 13), etc.

If we had enough levels of grp and those levels were normally distributed, then the above approach would come to very similar conclusions as the “gaussian” approach.

3 Likes

Awesome! That lines up with my understanding. Thanks!

Just to clarify, because I was/am confused by the same thing, I thought I’d ask:

Gaussian: Just sample new values from the MVN RE distribution

old_levels: For predicting new_data, row 1, randomly choose a person from the old_data, and only use their RE distribution (?)

uncertainty: Sample from the /mixture/ of random effect distributions; e.g., for each MCMC sample, sample k from 1:K, get RE[k], continue ?

I.e., Gaussian is using the estimated gaussian prior to generate REs. old_levels conditions each new prediction i on a randomly chosen group k. uncertainty generates samples from the mixture of all K RE distributions, by sampling a group k and taking its value at sample s ?

1 Like

Yes exactly!

2 Likes

Perfect, thanks.

I’m not sure how to make the docs clearer, but I do find the docs a bit vague on this matter. Thinking of “uncertainty” as a mixture distribution makes it immediately clear how it’s different from old_levels. Just a suggestion (though I doubt most people actually change or look into this option).

2 Likes

@paul.buerkner would you like a pull request for the doc on this? I regularly change this setting (for MRP style analyses).

1 Like

That would be perfect, thank you!

Great, followed up with this pull request for those on the thread who are interested.