Is model averaging using pp_average.brmsfit a good way out of fitting very large datasets?

George_GL · April 22, 2022, 9:57am

Hi everyone,

i just have a quick technical question.

My task is to run in brms a mixed effects, random intercept, random slope model on a very large data set, 400k observations that would take me days to run on a good EC2 instance.

My strategy would involve drawing random sub-samples from the complete dataset, run the models independently and then average the posteriors using the pp_average.brmsfit function.

Question: is this a legitmate use of the pp_average function, and is my approach valid, from a theoretical standpoint?

Thanks!
G.

avehtari · April 24, 2022, 2:48pm

No and no. Depending on the shape of the posterior you could use consensus Monte Carlo or something more elaborate as in Expectation Propagation as a Way of Life: A Framework for Bayesian Inference on Partitioned Data (this paper has also overview of the problem you are interested in)

George_GL · April 24, 2022, 3:56pm

Many thanks for the answer and I’ll definitely read the reference.
Thank you,
G.

P.S. Good to know, I’m not alone in coming with this up… Sounded very appealing.

Topic		Replies	Views
How to average repeated brms models into one, to please reviewer #2? Modeling	8	557	February 2, 2023
Model averaging in brms General loo	3	1679	June 15, 2021
Splitting large datasets to increase efficiency brms	2	1607	March 13, 2019
Posterior_average Modeling brms	0	391	October 30, 2023
Sampling over many phylogenetic trees in brms brms	24	3042	January 24, 2019

Is model averaging using pp_average.brmsfit a good way out of fitting very large datasets?

Related topics