Let’s say I’ve generated a posterior distribution for a parameter in a stan regression model, represented by a vector of 30,000 parameter estimates.
I might want to do some work with these posterior estimates that involves a function that is very time consuming to run a large number of times - so might be tricky to do on all 30,000 posterior estimates. If I just wanted to run it on say 1500 samples, would I get a closer representation of the posterior by sampling with, or without, replacement?
I suspect with replacement is better, but I’d have to do the simulations to be sure. At a rate of 1500 out of 30000 it’s probably unlikely to make much difference? If your function is very expensive to evaluate, maybe consider something like this: https://arxiv.org/abs/2005.03952?
Without replacement is better than with.
Easiest to use thinning, that is, pick every mth iteration as this will then reduce autocorrelation.
Stein thinning can produce smaller subsample than thinning with the same accuracy for estimating certain expectations, but has its own computational overhead so I would start with thinning.
If you use R, I recommend to use posterior package, which has useful tools for thinning and checking quality of the thinned posterior.
Interesting! What’s the intuition behind this?
Thanks @hhau and @avehtari I will check out the posterior package. Thinning sounds like a simple and reasonable approach, and I also hadn’t thought about the issue of autocorrelation, which would be solved by selecting estimates further apart.
My intuition regarding sampling with replacement was that it might make the sampled posterior seem marginally more precise than it really should be, because the most likely central points continue to retain very high probability, so maybe the tails don’t get as much opportunity to show themselves vs. the high probability center. However, that is a total guess!
What I say is conditional that we have a big sample from which we want to resample a small sample (the situation is different if we want to resample as many draws as the original sample size).
- Let’s first consider independent draws. If we resample without replacement, then each draw appears in the smaller sample only once and when computing expectations each draw has the same weight. If we resample with replacement some draws may appear more than once in the smaller sample and when computing expectations this corresponds to have the original draws with different weights (which are binomially distributed). The variability in the weights reduces the effective sample size (see the effective sample size estimates for importance sampling for equations).
- If we have autocorrelations we can improve compared to the random resampling by resampling so that we would prefer the draws that are further away with respect to the iteration number. The extreme case is deterministic thinning taking every mth draw, maximizing the distance in iterations.
- Stein thinning introduces additional smoothness assumption (which means that it may fail if the posterior is not smooth) and favors draws which are also further away from each other in the parameter space. Including additional smoothness prior, reduces the variance of the expectation estimates (conditionally that the posterior is actually smooth).
@avehtari nice, in another thing I was trying I realised I could use thinning to sample and plot some different regression lines from the posterior, rather than random sampling or just taking the first 25. It indeed looks like a much more varied/representative sample from the posterior.