Sample from posterior - with or without replacement?

JimBob · May 28, 2020, 3:00pm

Let’s say I’ve generated a posterior distribution for a parameter in a stan regression model, represented by a vector of 30,000 parameter estimates.

I might want to do some work with these posterior estimates that involves a function that is very time consuming to run a large number of times - so might be tricky to do on all 30,000 posterior estimates. If I just wanted to run it on say 1500 samples, would I get a closer representation of the posterior by sampling with, or without, replacement?

hhau · May 28, 2020, 5:54pm

I suspect with replacement is better, but I’d have to do the simulations to be sure. At a rate of 1500 out of 30000 it’s probably unlikely to make much difference? If your function is very expensive to evaluate, maybe consider something like this: https://arxiv.org/abs/2005.03952?

avehtari · May 28, 2020, 7:29pm

Without replacement is better than with.

Easiest to use thinning, that is, pick every mth iteration as this will then reduce autocorrelation.

Stein thinning can produce smaller subsample than thinning with the same accuracy for estimating certain expectations, but has its own computational overhead so I would start with thinning.

If you use R, I recommend to use posterior package, which has useful tools for thinning and checking quality of the thinned posterior.

hhau · May 29, 2020, 9:08am

Interesting! What’s the intuition behind this?

JimBob · May 29, 2020, 10:05am

Thanks @hhau and @avehtari I will check out the posterior package. Thinning sounds like a simple and reasonable approach, and I also hadn’t thought about the issue of autocorrelation, which would be solved by selecting estimates further apart.

My intuition regarding sampling with replacement was that it might make the sampled posterior seem marginally more precise than it really should be, because the most likely central points continue to retain very high probability, so maybe the tails don’t get as much opportunity to show themselves vs. the high probability center. However, that is a total guess!

avehtari · May 30, 2020, 9:36am

What I say is conditional that we have a big sample from which we want to resample a small sample (the situation is different if we want to resample as many draws as the original sample size).

Let’s first consider independent draws. If we resample without replacement, then each draw appears in the smaller sample only once and when computing expectations each draw has the same weight. If we resample with replacement some draws may appear more than once in the smaller sample and when computing expectations this corresponds to have the original draws with different weights (which are binomially distributed). The variability in the weights reduces the effective sample size (see the effective sample size estimates for importance sampling for equations).
If we have autocorrelations we can improve compared to the random resampling by resampling so that we would prefer the draws that are further away with respect to the iteration number. The extreme case is deterministic thinning taking every mth draw, maximizing the distance in iterations.
Stein thinning introduces additional smoothness assumption (which means that it may fail if the posterior is not smooth) and favors draws which are also further away from each other in the parameter space. Including additional smoothness prior, reduces the variance of the expectation estimates (conditionally that the posterior is actually smooth).

JimBob · June 2, 2020, 10:14am

Very clear, thank you!

JimBob · June 2, 2020, 4:51pm

@avehtari nice, in another thing I was trying I realised I could use thinning to sample and plot some different regression lines from the posterior, rather than random sampling or just taking the first 25. It indeed looks like a much more varied/representative sample from the posterior.

Topic		Replies	Views
Updating Posteriors in Light of More Data Algorithms	4	723	August 30, 2023
Fitdistr and resample as a strategy for overlarge data sets Modeling fitting-issues , performance	6	900	July 5, 2017
R memory-conservation strategies with Stan Modeling specification , performance	4	577	February 28, 2021
Using (autocorrelated?) posterior samples in calculations Modeling	5	568	June 18, 2019
Including a posterior from previous experiment for a coefficient Modeling	2	415	September 15, 2020

Sample from posterior - with or without replacement?

Related topics