Aggregation posterior from small data sets

Hi, Stanimals! I am gonna to ask for suggested solutions!

My question is that is there any strategy to aggregate posterior given small data sets to approximate posterior given the large data set?

I am working on a regression problem with medical imaging being covariates. I am sorry that the images cannot be shared since they are confidential due to ethics.

Here comes a problem that the size of data to be analyzed is quite huge and both MCMC by Stan and ADVI works slowly. That is, find out posterior given the full data set is quite difficult. However, on a small data set, the computation is feasible. Then I wander if it is possible to compute posterior on several small data sets parallely and aggregate them finally.

But I don’t know whether it is reasonbale and how to do that exactly. Wish you will help me.
Many thanks!



It is a natural idea to employ some divide-and-conquer strategy to write the target posterior as a product of subposteriors, and I think many subsampling MCMC exists that tries to run MCMC on subsets in parallel and subsequently combining the subposteriors (e.g.), although many approximations rely on some exchangeable assumptions. I don’t recall there is a blackbox implementation of such subsampling strategy in Stan. @maxbiostat and I are doing some research in the same vein so perhaps stay tuned.


Any updates on the state of development of this? I’d be really interested in participating and helping if you’re open to it.