Estimating Subject Level Effects for New Subjects in IRT, or multilevel models in general

Sorry for taking to long to answer - your question is relevant and well written.

You are AFAIK correct. Unfortunately how to actually avoid the need to refit is AFAIK an open research problem. I think all of the other options you give are not great - as you can see from how much they differ from the full model approach.

Some additional approaches to consider:

  • One can use the posterior of the parameters from the first fit as a prior for the parameters in the new fit which would then have only the new subjects as data (and should thus be fast to fit). There is a bunch of traps along the way, but it is something I was able to get to (at least roughly) actually work (see my StanCon 18 submission). In particular, if you happen to have a lot of data for the initial model, than the posterior might be sufficiently close to multivariate normal to let you do this. I think a good summary of the important points is at Composing Stan models (posterior as next prior) and Using posteriors as new priors - #6 by betanalpha . I also remember (but cannot find the link) someone mentioning fitting a neural network to the posterior, passing the neural network params to build a prior for the next model. The problem is that passing a global distribution with between-parameter correlations to brms would require some hacking around (you could likely achieve that via stanvars though).

  • Since the one new data point should not change the posterior very much, you could somewhat speed up refitting of the whole model by passing the adaptation information from the original model (step size, mass matrix) - I am not sure this is easily achievable via brms, but might be worth some investigation. You could also use posterior means of the original fit as inits. If you can pull this off, you could most likely use much shorter warmup (or even avoid warmup completely).

  • The “predict and filter” approach seems somewhat related to importance sampling. Now, this is a wild guess (I’ve never done this or seen it done), but maybe using the old posterior as a proposal distribution and the use the likelihood of the full model for weighting could work quite well. Unfortunately, brms AFAIK doesn’t let you just compute the likelihood, so you would have to either rewrite the likelihood yourself or hack around brms to expose it…

  • Also [1412.4869] Expectation propagation as a way of life: A framework for Bayesian inference on partitioned data seems related, but I don’t understand the method and once again I don’t think there is a ready-made implementation.

All of the approaches I describe would IMHO likely suffer from some sort of drift so you would need to refit the whole model occasionally anyway, so all in all, I am not sure avoiding the refitting is worth the hassle. Focusing on speeding up the refitting (e.g. by using the recently added support for within-chain paralellization in brms or by testing whether you can get away with shorter warmup/sampling) could possibly be a better investment.

Best of luck with your project!

1 Like