This forum has amassed several questions over the years (e.g. here, here, here, here, here, here) about whether we can efficiently update a Stan model if we collect a modest number of new data points, without refitting the whole thing.
When this came up most recently, it occurred to me that such updating should in general be readily achievable via PSIS. In many cases this should work quite well, since more data may act to tighten the posterior, such that the old posterior (the proposal distribution) will have more mass in the tails than the new target posterior. Maybe this approach is obvious, but it seems not to have been mentioned before on the posts linked above.
In case it’s useful to anyone, I wrote a function (and quickly wrapped it in an R package; the package is very much not ready for primetime) that uses PSIS to update a brms
model based on new data. The approach is to compute the Pareto-smoothed importance weights reflecting the new data, then to actually resample (with replacement) the draws inside the brmsfit
object using those weights, so that we end up with a new brmsfit
object that we can manipulate exactly like the old one, but whose draws reflect the updated posterior. Doing this resampling step incurs a loss of information, but AFAIK is necessary to enable summarizing and post-processing the updated model via standard brms
tools.
I’ve called the function and package upsis
for “Updating with PSIS” and for a way to efficiently “upsize” the data (get it?).
So you can do something like
remotes::install_github("jsocolar/upsis")
library(upsis)
set.seed(1)
# generate initial data
x <- rnorm(10)
y <- rnorm(10, x)
df <- data.frame(x = x, y = y)
# generate additional data
x2 <- rnorm(20)
y2 <- rnorm(20, x2)
df2 <- data.frame(x = x2, y = y2)
# fit initial model
fit <- brm(y ~ x, data = df, backend = "cmdstanr")
# fit updated model
fit2 <- upsis(fit, data_add = df2)
summary(fit)
summary(fit2$updated_model)
Note that using upsis
is not always exactly the same as fitting a new brms
model with the updated dataset, because brms
sometimes uses data-dependent priors and/or model structures (e.g. the positions of knots in splines) that do not get updated by upsis
.
In tinkering on this, I was really happily surprised with how trivially easy it was to use the loo
package to perform PSIS for purposes other than LOO-CV. Big props to @avehtari @jonah and company for this awesome tool! The trickiest part of this all (which still requires some code cleanup and re-factoring to make it less brittle) was figuring out how to extract, resample, and reinsert draws into a stanfit
object.
upsis
should be easily extensible to objects from rstanarm
and any other object that provides a way to calculate the log likelihood of observing some set of new data.