Simulations for new levels of a factor

blakeobeans · March 30, 2022, 12:33am

Hi, I have built a multi-level logistic model with RStanARM. It contains variables on student names, their courses, and a couple of controls as factors, for example whether there is a PC or Mac in the classroom. The outcome of interest is whether a student passes a test in that situation (0 or 1). I would like to model the probability of a new student passing the test under various combinations of courses and controls.

My intuition would tell me to use posterior_predict, except the issue is that I want to make a prediction for a student who doesn’t exist yet. Ideally, I would like to incorporate the hyperparameter on the coefficient for the student variable into my prediction because this represents the average student effect. In addition, I would like to incorporate the uncertainty around this prediction as well. In the end, I would like a single posterior distribution for each combination of other factors I would like to test.

What is the correct way to do this in RStanARM? I’m wondering if there is a vignette on this?

I’m happy to provide some code for demonstration purposes but I thought I would just start in plain English.

JimBob · March 30, 2022, 1:31pm

Most of the posterior prediction functions have an argument ‘allow_new_levels’ - if you set it to TRUE then it will make estimates for new exemplars.

JimBob · March 31, 2022, 10:26am

See here for some further documentation and the ways in which you can have the new levels sampled:

sample_new_levels:
Indicates how to sample new levels for grouping factors specified in re_formula . This argument is only relevant if newdata is provided and allow_new_levels is set to TRUE . If "uncertainty" (default), each posterior sample for a new level is drawn from the posterior draws of a randomly chosen existing level. Each posterior sample for a new level may be drawn from a different existing level such that the resulting set of new posterior draws represents the variation across existing levels. If "gaussian" , sample new levels from the (multivariate) normal distribution implied by the group-level standard deviations and correlations. This options may be useful for conducting Bayesian power analysis or predicting new levels in situations where relatively few levels where observed in the old_data. If "old_levels" , directly sample new levels from the existing levels, where a new level is assigned all of the posterior draws of the same (randomly chosen) existing level.

Topic		Replies	Views
Question regarding posterior_predict used on new observations with new covariate levels brms	1	366	August 11, 2020
Brms posterior_predict for single factor Modeling brms	3	903	March 6, 2023
Posterior_predict() in brms brms	1	897	June 14, 2020
Understanding sample_new_levels = "uncertainty", "gaussian", and "old_levels" brms	17	2304	May 20, 2022
Stan_glmer() + posterior_predict() question rstanarm	1	1701	November 16, 2017

Simulations for new levels of a factor

Related topics