Fitted.brmsfit's output varies for the same model and data

It’s a multilevel model with a categorical response, logit link. Brms version is 2.19.0. When I use fitted.brmsfit to make a prediction for a hypothetical new datapoint for which the group-level factor in newdata equals NA, the predicted values differ across function calls.

This does not happen when I use re_formula = NA or give the group-level factor a value that is actually observed in the training data. But it does also happen with re_formula = NULL, scale = "linear" when the group-level factor in newdata equals NA.

Based on the above, I infer that this stochasticity must be related to the way in which the link scale parameter values are averaged across draws. But shouldn’t this average be identical across calls given that it’s based on all the draws? In fact, shouldn’t it be simply something akin to rowMeans(posterior_samples(mod))?

My mind is boggling real bad, causing anxiety and anguish.

brms does not include parameters for new groups in the model, and thus the posterior sample does not contain draws for the parameters for a new group.

When you afterward, ask draws for the new group, brms samples them in R. If you don’t set the random number generator seed before the call, the by the usual logic you should get different draws in each call.

If you include re_formula=NA you are asking to not include group specific parameters (old or new). If you ask predictions for existing group, then the parameters for those were included in the original model and thus sampled using Stan.

If you don’t want stochasticity, fix the RNG seed before calls. However, if the results for your quantity of interest are sensitive to this stochasticity, you may have a problem with posterior sample size or with your model.

2 Likes

Thanks Aki!