Aggregation posterior from small data sets

Mayormore · June 1, 2022, 10:25am

Hi, Stanimals! I am gonna to ask for suggested solutions!

My question is that is there any strategy to aggregate posterior given small data sets to approximate posterior given the large data set?

I am working on a regression problem with medical imaging being covariates. I am sorry that the images cannot be shared since they are confidential due to ethics.

Here comes a problem that the size of data to be analyzed is quite huge and both MCMC by Stan and ADVI works slowly. That is, find out posterior given the full data set is quite difficult. However, on a small data set, the computation is feasible. Then I wander if it is possible to compute posterior on several small data sets parallely and aggregate them finally.

But I don’t know whether it is reasonbale and how to do that exactly. Wish you will help me.
Many thanks!

maxbiostat · June 2, 2022, 1:05am

@yuling

yuling · June 24, 2022, 1:38am

It is a natural idea to employ some divide-and-conquer strategy to write the target posterior as a product of subposteriors, and I think many subsampling MCMC exists that tries to run MCMC on subsets in parallel and subsequently combining the subposteriors (e.g.), although many approximations rely on some exchangeable assumptions. I don’t recall there is a blackbox implementation of such subsampling strategy in Stan. @maxbiostat and I are doing some research in the same vein so perhaps stay tuned.

tinosai · July 30, 2022, 3:42am

Any updates on the state of development of this? I’d be really interested in participating and helping if you’re open to it.

Topic		Replies	Views
Splitting data and combining sub-posteriors for “big” data General	7	1902	July 7, 2020
Implementing and evaluating a new inference algorithm Developers	5	909	February 16, 2022
Subsampling in parallel and MCMC Algorithms mcmc	12	1839	April 29, 2019
Pre-posterior distributions in Stan? Modeling	3	497	April 6, 2020
Posteriordb v 1.0.0 released General	1	266	May 5, 2025

Aggregation posterior from small data sets

Related topics