Is model averaging using pp_average.brmsfit a good way out of fitting very large datasets?

Hi everyone,

i just have a quick technical question.

My task is to run in brms a mixed effects, random intercept, random slope model on a very large data set, 400k observations that would take me days to run on a good EC2 instance.

My strategy would involve drawing random sub-samples from the complete dataset, run the models independently and then average the posteriors using the pp_average.brmsfit function.

Question: is this a legitmate use of the pp_average function, and is my approach valid, from a theoretical standpoint?


No and no. Depending on the shape of the posterior you could use consensus Monte Carlo or something more elaborate as in Expectation Propagation as a Way of Life: A Framework for Bayesian Inference on Partitioned Data (this paper has also overview of the problem you are interested in)

1 Like

Many thanks for the answer and I’ll definitely read the reference.
Thank you,

P.S. Good to know, I’m not alone in coming with this up… Sounded very appealing.