Fit random effects for new levels without refitting entire model?

Is it possible to fit new random effects for new levels, and then make predictions? I have data with multiple timepoints measured per participant, each with 4 timepoints, fit in BRMS using a linear mixed model with random slopes & intercepts.

Now, I want to use this model to make predictions in new participants. Imagine a new participant who thus far has 3 timepoints measured. Is there a way to fit new random effects for this new participant based on the first 3 timepoints, to predict the value of the 4th timepoint, without having to refit the full model?

Thank you!

That’s an interesting question! I think it might depend on the structure of your original model. As you probably know, you can make predictions for new groups by drawing the group-level parameters from the distributions defined by the hyperparameters. In you case, you have the posterior distribution of the parameters of the distribution of slopes and intercepts, so you could draw new ones for new, unobserved groups.

But what I understand is that you want to get predictions for partially observed groups. In other words, the distribution of possible observations at t = 4, given both the observations at t = 1, 2, and 3; AND the hyperparameters for the distribution of slopes and intercepts.

ideally of course you would fit the model again, so that the new observations could inform the distribution of the hyperparameters. But, suppose that you have a massive sample size in the original model, three new numbers is unlikely to make a difference.

I’m just speculating, but would it be possible to get close to the right answer here? Two ideas occur to me. First, I wonder if you could fit a small model to just the first three timepoints for your new participant, but using the hyperpriors to set a very “informative” prior. I mean, set the prior parameters for the new slope an intercept to the MAP estimates of the hyperparameters.

Second, I wonder if an even faster way – again, depending on the model structure – would be to do the same kind of thing, using conjugate priors, in this kind of way: Bayesian linear regression - Wikipedia

This wouldn’t benefit from the full advantages of a hierarchical model, but i feel like it should still help you get a better prediction than fitting just those original three points. What do you think?

Your understanding is correct, and I like your wording of “partially observed” groups. The use case is exactly as you describe. We have a large set of training cases and want to apply the model reproducibly to new cases to make dynamic predictions as the data are being collected in real time. Having to refit the full model for each new observation is feasible now, though will become increasingly computationally expensive over time.

Using the fitted data to encode informative priors is a good idea to make a reasonable approximation! I will give that a try. I guess it requires fitting some kind of parametric distribution to the posterior draws, and then using those distributions as the prior? I will also look into the conjugate prior approach though not sure it will work for the specific model structure.

Thank you for the detailed response. I will give these ideas a try and report back.

Given your description of the use case, I think you might be able to do this more formally with PSIS. As you add a single new observation, the new posterior that you want is the same thing as the LOO posterior where we “leave out” an “observation” whose likelihood contribution is the negative of the likelihood contribution of the new observation. Maybe @avehtari can advise more, but this is how I would approach your problem. Doing it this way has the major advantage that you will get diagnostics for when this approximation is likely to fail badly.

1 Like

Yes, it would be possible to make a stand alone generated quantities block, which would generate the additional draws from the prior of the “random effect”, and compute the log likelihood given the new observations. Although add-more-data is easier for IS than leave-out-some-data, it’s still likely that the prior and posterior for that specific “random effect” are so different that even PSIS can’t make the importance weighting stable. If there is (are) only one (or two) “random effect(s)” per group, then it is possible to use quadrature integration inside the generated quantities block to get stable importance weights (as demonstrated in Roaches cross-validation demo)