Updating Posteriors in Light of More Data

We often seem to end up in a setting where we have a Stan model that takes a long time to fit and then someone comes along with a small set of additional data. We’d like to refit the model, but it feels like that’s overkill. It transpires that, in this setting, we can use importance sampling to update the posterior very quickly.

@alphillips has made a little package that does this in R (and we have one we plan to make public that does the same in Python (using BridgeStan)). We find these little packages useful. See here: GitHub - codatmo/stanIncrementalImportanceSampling: Basic R package to perform importance sampling to update fitted Stan models with new data

We are pondering how best to make this part of the Stan Universe. With that in mind, can I encourage users with an interest in this topic to take a look at the github repo and comment on this post? Once we’ve got a view on the community’s thoughts on what would be most useful, we’ll write a design document with a view to adapting what we have to give users what they want.

10 Likes

Cool! You could use posterior package to do the resampling as it has some better alternatives to the simple resampling and makes it easy to handle all parameters while keeping the posterior object compatible, e.g. with bayesplot. See posterior::weight_draws() and posterior::resample_draws(). There is also a PR for posterior to add Pareto-\hat{k} diagnostic, which would make it easy to diagnose reliability of the importance sampling. You could also further improve how much the posterior can change by using importance weighted moment matching.

5 Likes

I haven’t had a chance to look at the code yet, but this sounds quite useful. I also agree with @avehtari’s suggestions.

Depends on what you mean by Stan Universe. There is a somewhat recent formal process implemented by a previous @SGB where developers can vote on making it part of the Stan project (moving the code into stan-dev on GitHub, listing it on the Stan website, etc). I think the inclusion of posteriordb was the first time it was used.

Or, if you just want it to be used more widely by Stan users, you could submit a case study for the Stan website that demonstrates how to use it. That would get more eyes on it and the process is easy. I would probably do this either way and decide about the formal process after if you’re interested in that.

5 Likes

It would be interesting to me to see a short vignette describing the intuition/theory behind the code. I also thought it would be helpful to have a function that automatically does the resampling, e.g. that covers these lines below, also handling the case of multiple parameters.

resampledIterations <- sample(
  seq(1,length(originalSamples)),
  replace = T,
  prob = importanceSamplingResults$weights
)
resampledMu <- originalSamples[resampledIterations]
3 Likes

Thanks for the suggestions. We’ve just made the Python implementation publicly available here:

Thanks @avehtari, I wasn’t aware of posterior’s resampling functions - I’ve implemented simple multinomial resampling in the python implementation but do intend to add systematic resampling.

Cheers - I’ve implemented multiple-parameter resampling in the python implementation, and based on what Aki says above, posterior should be able to handle resampling in R.

Thanks - I like @jonah’s suggestion of a case study and agree this background would be useful for those unfamiliar with importance sampling.

2 Likes