Idea for out of chain parallel MCMC

vsaase · July 12, 2017, 7:53am

Hi,
this may be a naive question, but I am wondering about the validity of following idea:

take k (for example k=10) random subsets of the data, run a sampling chain for every subset (say 1000 warmup iterations, 100 sample iterations).
for every set of 100 parameter samples do a Bayesian update with the rest of the dataset (the k-1 other subsets), resulting in weights for every sample. This should not be too computationally expensive since we compute on a discrete parameter space.
all those k*100 weighted samples together should be a good approximation of the posterior distribution

The advantage here is the fully parallel computing procedure, so we could take advantage of HPC clusters.
Is there something I am missing?

I see that the 100 samples are not very good for estimating the marginal likelihood. Alternatively the updating could be done sequentially for every subset, thus always using the 900 leftout samples.

cheers,
Victor

bbbales2 · July 12, 2017, 4:39pm

Well with any method, step one is laying it out and figuring out exactly what you’re getting right and what you’re getting wrong. If you’re cutting data into pieces, that’d be the place to start. Maybe things can be recombined later maybe not.

When it comes to statistical approximations, it’s easy to get caught up in the idea that whatever small assumption you’ve had to make won’t be that big a deal in with whatever problem you’re working on and you’ll still get at the true posteriors you’re after. Practically though, it’s hard enough to figure out a useful model and get good sampling on it even with an exact algorithm. It’s fun to play with this stuff, but it’s hard to trust it.

Here’s a thing from Betancourt about the 8-schools model in Stan (simple model, small data, fancy algorithm -> still really hard to get it right): http://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html

Here’s a thing by Bob on ensemble methods: http://andrewgelman.com/2017/03/15/ensemble-methods-doomed-fail-high-dimensions/

Best of luck!

edit: changed desc. of the Bob link

Bob_Carpenter · July 13, 2017, 6:44pm

Or you could read Michael Betancourt’s paper,

The Fundamental Incompatibility of Scalable Hamiltonian Monte Carlo and Naive Data Subsampling http://proceedings.mlr.press/v37/betancourt15.pdf

and then follow that up with Andrew Gelman, Aki Vehtari, et al. on EP (which uses the cavity distribution to mitigate some of the problems with the kind of naive subsampling you and others suggest):

Expectation propagation as a way of life.
https://arxiv.org/abs/1412.4869

Topic		Replies	Views
Splitting data and combining sub-posteriors for “big” data General	7	1901	July 7, 2020
Bayesian inference after multiple imputation Modeling	7	1200	March 22, 2024
Subsampling in parallel and MCMC Algorithms mcmc	12	1839	April 29, 2019
Implementing and evaluating a new inference algorithm Developers	5	909	February 16, 2022
Multi-modality of posteriors Modeling fitting-issues	3	808	December 10, 2018

Idea for out of chain parallel MCMC

Related topics