I am trying to perform model comparisons of 3 hierarchical models (Poisson family, log link) fit on a dataset with approximately 180k observations.
Not sure if this is helpful, but the simplest model is of the form:
formula0 <- case01 ~ -1 + var0 + prop_total123 +
# Random intercepts for each strata
(1|strata) +
# Random slopes for each series
(0 + prop_total123 | series)
And the most complex model is of the form:
formula2 <- case01 ~ -1 + var0 + prop_1 + prop_2 + prop_3
# Interactions
2levelfactor*prop_1 +
2levelfactor*prop_2 +
2levelfactor*prop_3 +
# Random intercepts for each strata (step in series)
(1|strata) +
# Random slopes for each series
(0 + prop_1 | series) +
(0 + prop_2 | series) +
(0 + prop_3 | series)
With the size of my dataset, fitting these models on a 32-core server with 128 gb of RAM takes anywhere from 5 to 12 hours depending on iterations/cores/threads… No error messages appear, diagnostic plots check out… but when I try to compare models I’ve hit a wall. When I use brms::add_criterion(model0, "loo")
, the function maxes out 128 gb of RAM and crashes R. I then found brms::loo_subsample()
which is using little RAM but has yet to spit out an answer after running for about 30 hours.
I’m new to brms and stan, so perhaps my hopes for this to work are ill-conceived? Is it unrealistic to get this to work on a model this size with this much complexity? Are there any options to speed things up? I’ve had a quick look at loo::loo_subsample()
but the inputs for that function seem beyond my current knowledge of brms.fit
object innards. Is there any obvious info I’ve overlooked that might be helpful for you to understand what’s happening? I ommitted a fully reproducible example because fo the sheer size of the dataset… but I can always take a crack at it if required.
- Operating System: Linux Mint 20 Cinnamon
- brms Version: 2.18.0
- R Version: 4.2.1