Is model averaging using pp_average.brmsfit a good way out of fitting very large datasets?

Hi everyone,

i just have a quick technical question.

My task is to run in brms a mixed effects, random intercept, random slope model on a very large data set, 400k observations that would take me days to run on a good EC2 instance.

My strategy would involve drawing random sub-samples from the complete dataset, run the models independently and then average the posteriors using the pp_average.brmsfit function.

Question: is this a legitmate use of the pp_average function, and is my approach valid, from a theoretical standpoint?


No and no. Depending on the shape of the posterior you could use consensus Monte Carlo or something more elaborate as in Expectation Propagation as a Way of Life: A Framework for Bayesian Inference on Partitioned Data (this paper has also overview of the problem you are interested in)

Many thanks for the answer and I’ll definitely read the reference.
Thank you,

P.S. Good to know, I’m not alone in coming with this up… Sounded very appealing.