Compare for manyish models

tomli · March 11, 2019, 3:01pm

Operating System: linux cluster or windows 10
Interface Version: R/3.5.1-nsc1-gcc-2018a , RStudio/1.1.463

Is there a smart way to use the compare function for loo estimates when there are many models to be compared (in my case 25 or so)? I can store the loo lists for each model i as e.g. LOOlist[[i]], but I can’t find any other way to compare them all than writing compare(LOOlist[[1]], LOOlist[[2]],…,LOOlist[[25]]). Has anyone come up with something neater?

bgoodri · March 11, 2019, 3:08pm

do.call(compare, args = LOOlist) might work but the whole idea of comparing manyish models by ELPD is suspect because the bias gets largish.

tomli · March 11, 2019, 3:52pm

do.call doesn’t seem to work, unfortunately.
But the notion that loo might be inappropriate when comparing multiple models is a larger issue. Why is that, and what would you suggest using instead? Is WAIC (or even DIC) more appropriate?

bgoodri · March 11, 2019, 3:54pm

Those are less appropriate but all of them are asymptotic and the rates at which they converge are slower when there are more models to compare.

tomli · March 11, 2019, 4:15pm

Thanks Ben
Much of the details behind LOO is new to me, so I apologize for the noobish questions.
When you say the bias gets largish, is there a general expectation in which direction this bias is expected to lean? Favor more complex models? Or simpler?

bgoodri · March 11, 2019, 7:34pm

I don’t know but @avehtari might.

avehtari · March 11, 2019, 8:41pm

See Piironen and Vehtari (2017). Comparison of Bayesian predictive methods for model selection. Statistics and Computing, 27(3):711-735. doi:10.1007/s11222-016-9649-y.
some of the relevant points are also discussed in a tutorial video, which might clarify some of the basics more easily than that paper.

Whether 25 models is too much depends on the ratio of number of observations and the number of parameters, and how noisy the data is. If there are many models which are close in performance with the best model, then any number of models is too much.

No, they have similar behavior but are slightly worse. See the same paper and video.

tomli · March 12, 2019, 3:28pm

Thanks Aki. That’s a great video presentation.
Is there any way to assess when ELPD (or LOOIC) are too close to draw any conclusions?
My specific problem involves nested models, and I understood from your presentation that you think it’s better to use the more complex model for prediction if two models are close together. Is that correct? And what’s the reasoning behind that?

avehtari · March 13, 2019, 3:26pm

compare reports elpd_diff and diff_se, which give some assessment, but diff_se may be underestimated and normal approximation for the difference is also likely to be wrong when the models have similar predictions. However it is telling something useful as the elpd_diff and diff_se used in model averaging are useful Yao, Vehtari, Simpson, and Gelman (2018). Using stacking to average Bayesian predictive distributions (with discussion). In Bayesian Analysis, 13(3):917-1003.

The more complex model includes the smaller model and the uncertainty which model is better is included in the posterior of the more complex model. Bayes theory says that if there is uncertainty you should integrate over it instead of fixing it 0. Only if the more complex model would so much worse that in that integration it would have 0 probability then it could be ignored. To find out which is the smallest model which can provide similar predictions as the most complex model is a different decision task discussed also in that video.

jonah · March 14, 2019, 9:30pm

Have you tried just compare(x=LOOlist)? You need to explicitly put the named argument x because it appears after the .... (Model comparison (deprecated, old version) — compare • loo)

Even better, if you can install loo 2.1.0 (just released so CRAN is still building binaries but you can install from source) then you can use the new loo_compare(), which is a better version of compare(),

comparison <- loo_compare(LOOlist)
print(comparison)
colnames(comparison)

tomli · March 15, 2019, 1:54pm

Yikes. compare(x=LOOlist) worked just fine! Embarrassing…

tomli · March 15, 2019, 2:02pm

Thanks Aki. I think I just misinterpreted your point in the video as an argument for using only the more complex model because uncertainty regarding all parameters are included in those predictions. But re-watching I see that’s not what you’re saying. Your point about integrating over the uncertainty by including weighted predictions from both (or more) models makes perfect sense. Thanks. And again, great video.

Topic		Replies	Views
Interpreting elpd_diff - loo package Modeling loo , interpret-results	47	14860	November 9, 2020
Interpreting output of multiple comparisons using loo Modeling loo , interpret-results	3	850	October 3, 2018
Clarifying interpretation of loo_compare output General loo	5	3882	December 10, 2020
Model checking & comparison using loo vs loo_compare Modeling	8	2656	January 4, 2024
Understanding LOOIC Modeling loo , interpret-results , cognitive-science	15	16042	November 8, 2021

Compare for manyish models

Related topics