Compare for manyish models

Operating System: linux cluster or windows 10
Interface Version: R/3.5.1-nsc1-gcc-2018a , RStudio/1.1.463

Is there a smart way to use the compare function for loo estimates when there are many models to be compared (in my case 25 or so)? I can store the loo lists for each model i as e.g. LOOlist[[i]], but I can’t find any other way to compare them all than writing compare(LOOlist[[1]], LOOlist[[2]],…,LOOlist[[25]]). Has anyone come up with something neater?

do.call(compare, args = LOOlist) might work but the whole idea of comparing manyish models by ELPD is suspect because the bias gets largish.

do.call doesn’t seem to work, unfortunately.
But the notion that loo might be inappropriate when comparing multiple models is a larger issue. Why is that, and what would you suggest using instead? Is WAIC (or even DIC) more appropriate?

Those are less appropriate but all of them are asymptotic and the rates at which they converge are slower when there are more models to compare.

Thanks Ben
Much of the details behind LOO is new to me, so I apologize for the noobish questions.
When you say the bias gets largish, is there a general expectation in which direction this bias is expected to lean? Favor more complex models? Or simpler?

I don’t know but @avehtari might.

See Piironen and Vehtari (2017). Comparison of Bayesian predictive methods for model selection. Statistics and Computing, 27(3):711-735. doi:10.1007/s11222-016-9649-y.
some of the relevant points are also discussed in a tutorial video, which might clarify some of the basics more easily than that paper.

Whether 25 models is too much depends on the ratio of number of observations and the number of parameters, and how noisy the data is. If there are many models which are close in performance with the best model, then any number of models is too much.

No, they have similar behavior but are slightly worse. See the same paper and video.

Thanks Aki. That’s a great video presentation.
Is there any way to assess when ELPD (or LOOIC) are too close to draw any conclusions?
My specific problem involves nested models, and I understood from your presentation that you think it’s better to use the more complex model for prediction if two models are close together. Is that correct? And what’s the reasoning behind that?

compare reports elpd_diff and diff_se, which give some assessment, but diff_se may be underestimated and normal approximation for the difference is also likely to be wrong when the models have similar predictions. However it is telling something useful as the elpd_diff and diff_se used in model averaging are useful Yao, Vehtari, Simpson, and Gelman (2018). Using stacking to average Bayesian predictive distributions (with discussion). In Bayesian Analysis, 13(3):917-1003.

The more complex model includes the smaller model and the uncertainty which model is better is included in the posterior of the more complex model. Bayes theory says that if there is uncertainty you should integrate over it instead of fixing it 0. Only if the more complex model would so much worse that in that integration it would have 0 probability then it could be ignored. To find out which is the smallest model which can provide similar predictions as the most complex model is a different decision task discussed also in that video.

Have you tried just compare(x=LOOlist)? You need to explicitly put the named argument x because it appears after the .... (Model comparison (deprecated, old version) — compare • loo)

Even better, if you can install loo 2.1.0 (just released so CRAN is still building binaries but you can install from source) then you can use the new loo_compare(), which is a better version of compare(),

comparison <- loo_compare(LOOlist)
print(comparison)
colnames(comparison) 

Yikes. compare(x=LOOlist) worked just fine! Embarrassing…

Thanks Aki. I think I just misinterpreted your point in the video as an argument for using only the more complex model because uncertainty regarding all parameters are included in those predictions. But re-watching I see that’s not what you’re saying. Your point about integrating over the uncertainty by including weighted predictions from both (or more) models makes perfect sense. Thanks. And again, great video.