Understanding sample_new_levels = "uncertainty", "gaussian", and "old_levels"

The documentation for brms::extract_draws() says this about the sample_new_levels option:

Indicates how to sample new levels for grouping factors specified in re_formula. This argument is only relevant if newdata is provided and allow_new_levels is set to TRUE. If "uncertainty" (default), include group-level uncertainty in the predictions based on the variation of the existing levels. If "gaussian", sample new levels from the (multivariate) normal distribution implied by the group-level standard deviations and correlations. This options may be useful for conducting Bayesian power analysis. If "old_levels", directly sample new levels from the existing levels.

I’m having a hard time understanding what the differences between the three options (“uncertainty”, “gaussian”, and “old_levels”) are.

How does “uncertainty” include group-level uncertainty based on the variation of the existing levels? Does it randomly sample from the existing group levels?

For “gaussian”, am I correct that for a term like (1 | grp), this option would return something like rnorm(n_draws, 0, sd_grp__Intercept)?

For “old_levels”, is one of the group levels chosen randomly at the start, and then only its draws are used?


I’ve read this discussion on the brms github and I’ve taken a look at the code for extract_draws.R but I’m still unsure on this.

3 Likes

Your understanding of “gaussian” and “old_levels” is correct.

“uncertainty” can be thought of fullfilling a similar purpose as “gaussian” but instead of using a gaussian distribution to sample from (which may or may not be a good representation of the group-level effects distribution), we sample from the distribution directly implied by the old levels in a bootstrap kind of fashion.

Hi Paul,

Could you clarify what draws are being bootstrapped? I can’t tell from the source code. For example, what would the bootstrapping look like for a simple Gaussian model like y ~ (1 | grp) when I call posterior_linpred(fit, newdata = data.frame(grp = "new_group"), allow_new_levels = TRUE)?

Perhaps “bootstrapping” is not the right term. Here is what I mean. Suppose grp has 3 levels, call them a, b, and c and suppose we have 5 posterior draws for these levels:

a: 1 2 3 4 5
b: 6 7 8 9 10
c: 11 12 13 14 15

Then, in “uncertainty”, we sample as follows. For the first posterior draw, we choose from (1, 6, 11), in the second we choose from (2, 7, 12), in the third we choose from (3, 8, 13), etc.

If we had enough levels of grp and those levels were normally distributed, then the above approach would come to very similar conclusions as the “gaussian” approach.

4 Likes

Awesome! That lines up with my understanding. Thanks!

Just to clarify, because I was/am confused by the same thing, I thought I’d ask:

Gaussian: Just sample new values from the MVN RE distribution

old_levels: For predicting new_data, row 1, randomly choose a person from the old_data, and only use their RE distribution (?)

uncertainty: Sample from the /mixture/ of random effect distributions; e.g., for each MCMC sample, sample k from 1:K, get RE[k], continue ?

I.e., Gaussian is using the estimated gaussian prior to generate REs. old_levels conditions each new prediction i on a randomly chosen group k. uncertainty generates samples from the mixture of all K RE distributions, by sampling a group k and taking its value at sample s ?

1 Like

Yes exactly!

2 Likes

Perfect, thanks.

I’m not sure how to make the docs clearer, but I do find the docs a bit vague on this matter. Thinking of “uncertainty” as a mixture distribution makes it immediately clear how it’s different from old_levels. Just a suggestion (though I doubt most people actually change or look into this option).

2 Likes

@paul.buerkner would you like a pull request for the doc on this? I regularly change this setting (for MRP style analyses).

1 Like

That would be perfect, thank you!

Great, followed up with this pull request for those on the thread who are interested.

Hi All,

I’ve read the preceding discussion, and it seems to me that neither “uncertainty,” “gaussian,” nor “old_levels” matches exactly what I want to do, described below. But perhaps my understanding is incorrect - is there a simple option to accomplish my task in brms? Thank you!

The task:
After fitting a multilevel model, I want to make predictions for new data, which includes new levels of the grouping variable, using posterior_predict (or posterior_epred) with allow_new_levels=T. Now I must choose an option for sample_new_levels. I would like each unique new level in the new data to correspond to exactly 1 gaussian draw from the group level standard deviations and correlations, not a sample from the existing group levels. This narrows the choice down to sample_new_levels=“gaussian.” However, with sample_new_levels=“gaussian,” each posterior draw for a given row in the new data derives from a different gaussian draw from the group level uncertainty, whereas I would like each posterior draw for a given row in the new data to derive from exactly 1 fixed gaussian draw from the group-level uncertainty. Better yet, every row of the new data that shares the same new level of the grouping variable would produce posterior draws that all derive from the same single gaussian draw from the group-level uncertainty.

At the moment, this does not appear to be included in the built-in options. Am I mistaken? Is there a built-in option to accomplish this?

Thank you!

Your assessment is correct, but what you want to do is not well defined, because when you pick your one Gaussian draw to use across iterations, you haven’t specified which Gaussian you want to draw from. The hyperparameters vary across iterations.

I see - good point. However, it’s unsatisfying to think that a posterior predictive distribution for a single randomly gaussian-sampled new level is not possible in a rigorous way. Perhaps my task is only ill-defined if we insist that the posterior draws correspond one-to-one to the MCMC iterations. Instead, consider the following process, which could be applied to each new level in the new data:

  1. Sample an MCMC iteration to obtain hyperparameters that define a Gaussian distribution.
  2. Draw once from this Gaussian distribution to represent the new level in the new data. Call this draw “A.”
  3. Define a weight for each MCMC iteration proportional to the probability density of “A” according each MCMC iteration’s unique hyperparameters.
  4. To generate each ultimate posterior draw for this new level in the new data, sample an MCMC iteration according to their weights, and use its primary parameters along with “A.”

Of course, the result would not be predictions that encompass uncertainty for all future data sets with new levels (as sample_new_levels=“uncertainty” or “gaussian” generate), but rather the result would be predictions for what one future data set might look like, following the structure and number of new levels provided in the new data. And the result might differ markedly from run to run.

This may extend beyond what is easily accomplished with built-in brms functionality, but still: what do you think? Does this seem a reasonable approach? Useful? I’m happy to explain my use for it, if that would be of interest.

Thank you again for your thoughtful input!

1 Like

I think knowing your use case might help. The procedure you’ve outlined definitely yields something, but I think that something is still sensitive to which posterior iteration you choose in step 1. Especially if the hyperparameters are highly uncertain, it is problematic to privilege a single posterior iteration in this way. Hopefully we can come up with a way to get you what you need while still integrating over the posterior uncertainty in a sufficiently nice way.

Of course! Here is my use case. I collaborate with experimentalists who collect single-cell data from lab mice and want to know whether experimentally manipulated predictors affect attributes of the individual cells. Which mouse each cell came from is the grouping variable. This is crucial, because “identical” lab mice actually vary quite a bit! Understandably, my collaborators can be vexed when they sometimes see the hypothesized impact of a predictor, but not consistently.

Frequently, it turns out that the experimentally manipulated predictors indeed exert a significant impact on cellular attributes generally, but the magnitude of this impact is rather small compared to the magnitude of natural variation among mice. After carefully constructing a model that fits the data well, I can show them that:
a) The model confirms that the effects of the predictors are “significant.”
b) The model estimates that variation among mice is large compared to the effects of the predictors.

What I want to further demonstrate is that a + b = the observed semi-consistent results among mice. In other words, I want to show that the model not only fits the current data well, but also predicts and explains this vexing semi-consistency for future data, too.

Posterior predictions that incorporate the full scope of uncertainty for many new mice don’t demonstrate this concept clearly to the untrained eye. Instead, a series of posterior predictions - each incorporating the uncertainty only for a specific new mouse - might do the trick. It would just be important never to let the latter be mistaken for the former.

1 Like

If I were in your place, I would do the entire demonstration start-to-finish per iteration, with sample_new_levels = "gaussian". If you can come up with either a numerical or a graphical way to illustrate this semi-consistency, then you can show the posterior distribution for the semi-consistency for unsampled mice across the full variation in the hyperparameters.

Suppose that your collaborators are fitting (gasp) a frequentist fixed-effects ANOVA and observing a wide range of p-values across repeated experiments. Then if you want you could fit the data with your model, and then per-iteration simulate the desired sample size of new mice, run the fixed-effects ANOVA on the result, and obtain a p-value from a hypothetical replicate simulation. Here, the p-value from a fixed-effects ANOVA is just one possible function f of the data, and you can plug in whatever it is that your colleagues are actually doing to get a posterior distribution for how their viewpoint on the data would vary across replicated datasets assuming that your model is correct.

1 Like

That’s clever! Not only does it accomplish the forward simulation of new levels that remain consistent across experimental conditions allowing an illustrative visualization of semi-consistent experimental results, but it also quantifies the semi-consistency with its own posterior distribution. This is especially nice, since “semi-consistent” is itself a rather ill-defined concept.

Thanks again for your insight!