# Simulations for new levels of a factor

Hi, I have built a multi-level logistic model with RStanARM. It contains variables on student names, their courses, and a couple of controls as factors, for example whether there is a PC or Mac in the classroom. The outcome of interest is whether a student passes a test in that situation (0 or 1). I would like to model the probability of a new student passing the test under various combinations of courses and controls.

My intuition would tell me to use posterior_predict, except the issue is that I want to make a prediction for a student who doesn’t exist yet. Ideally, I would like to incorporate the hyperparameter on the coefficient for the student variable into my prediction because this represents the average student effect. In addition, I would like to incorporate the uncertainty around this prediction as well. In the end, I would like a single posterior distribution for each combination of other factors I would like to test.

What is the correct way to do this in RStanARM? I’m wondering if there is a vignette on this?

I’m happy to provide some code for demonstration purposes but I thought I would just start in plain English.

Most of the posterior prediction functions have an argument ‘allow_new_levels’ - if you set it to TRUE then it will make estimates for new exemplars.

See here for some further documentation and the ways in which you can have the new levels sampled:

sample_new_levels:
Indicates how to sample new levels for grouping factors specified in `re_formula` . This argument is only relevant if `newdata` is provided and `allow_new_levels` is set to `TRUE` . If `"uncertainty"` (default), each posterior sample for a new level is drawn from the posterior draws of a randomly chosen existing level. Each posterior sample for a new level may be drawn from a different existing level such that the resulting set of new posterior draws represents the variation across existing levels. If `"gaussian"` , sample new levels from the (multivariate) normal distribution implied by the group-level standard deviations and correlations. This options may be useful for conducting Bayesian power analysis or predicting new levels in situations where relatively few levels where observed in the old_data. If `"old_levels"` , directly sample new levels from the existing levels, where a new level is assigned all of the posterior draws of the same (randomly chosen) existing level.