Speeding up multiple loo model comparisons in brms by using multiple cores?


I have 8 multilevel logistic regression brms models fit to the same data. Each model is relatively large, e.g., the file sizes of saved models are approx 1GB-1.5GB. Comparing all 8 using waic takes about 10 minutes, but comparing them using loo seems like it will take many many hours. I can’t say for sure because after about 6 hours or so the comparison lead to all the RAM being used (250GB).
I presume I can solve the RAM problem by pointwise=TRUE, but that will entail the comparison will take even longer.
I am working on 36 core machine, but loo is only using one core (adding cores = 2 or any other number as an argument to loo does nothing).
I presume there is no in principle reason why the model comparison can not be done in parallel, so is there any way of making this happen? I presume I could just use R’s parallel commands, like parLapply etc, to do all 28 pairwise comparison. That’s fine, and I will try that, but I was wondering if I am missing something simple and easy.

  • Operating System: Linux (Ubuntu 18.04)
  • brms Version: 2.4.0


Try setting options(loo.cores = 36) or something before calling loo.


Excellent. Just did now and it works.
Thanks a lot!


How many observations and how many model parameters do you have?


There are 5623 observations. The 8 models differ somewhat but in one representative example, I have 8 predictors and three grouping variables with 49, 182, 930 levels, respectively. The 8 predictors vary randomly by two of those grouping variables, and the remaining grouping variable has just random intercepts. From the brmssummary, there are 8 fixed effects, 36 (8 sd’s, 28 cor’s) random effects estimates for 2 grouping variables, and 1 random effects estimates for the third. So, by this, there are 8 + 36 + 36 + 1 = 81 variables being estimated.
I also have 20000 post-warmup samples.

Using brms::loo on just one model at a time, with pointwise = FALSE, using a single core on a Xeon Gold 6154, total running time is around 50 mins and uses around 40GB. (When I ran brms::loo(model_1, model_2 ... model_8), which does all pairwise comparisons too, it ran for many hours and eventually filled all the 250GB of ram and then aborted).


20000 post-warmup draws is very likely overkill. n_eff about 2000 would be sufficient for loo, and would make the loo computation probably more than 10 times faster (10 times just for less draws, and more due to more efficient memory usage).

Create loo object for each separately, and then call compare function.

If you would have just 81 variables and if p_eff<n/10, and you don’t get warnings then waic approximation is probably ok, too (probably, because there is not as good diagnostics as for loo). I’m not certain what level means here, but I guess it means the same as group id? In that case you would have much more than 81 variables being estimated?


I realize now that 20K samples is probably unnecessary. I initially did this because the SE for WAIC seemed to decrease when the number of samples increased, and so I thought it was wise to sample more rather than less for this reason.
When I say I have 81 variables, that is underestimating the number of variables in the model, but I thought that’s what you were suggesting. That number is essentially just counting the size of covariance matrices for the random effects and the fixed effects coefficients. By “levels” of the grouping variable, I mean the number of distinct values in the grouping variable. For example, one of my grouping variables is “subject”, which indicates the person in a cognitive psychology experiment, and I have random slopes that vary by subject. There were 182 subjects in my experiment. So there are 8 coefficients varying randomly per subject, so there alone we have 8 x 182 variables. So, using this, I have 8 + 36 + 36 + 1 + (8 * 49) + (8 * 182) + (1 * 930) = 2859 (maybe more, if I’m forgetting something).


Yes, this is how I would have counted, and in this case I would use loo instead of waic, as it has better diagnostics and is more reliable in case low n/p. Please let us know if using 2000 draws is fast enough for you.