this may be a naive question, but I am wondering about the validity of following idea:
- take k (for example k=10) random subsets of the data, run a sampling chain for every subset (say 1000 warmup iterations, 100 sample iterations).
- for every set of 100 parameter samples do a Bayesian update with the rest of the dataset (the k-1 other subsets), resulting in weights for every sample. This should not be too computationally expensive since we compute on a discrete parameter space.
- all those k*100 weighted samples together should be a good approximation of the posterior distribution
The advantage here is the fully parallel computing procedure, so we could take advantage of HPC clusters.
Is there something I am missing?
I see that the 100 samples are not very good for estimating the marginal likelihood. Alternatively the updating could be done sequentially for every subset, thus always using the 900 leftout samples.