Interpreting output from compare() of loo

annptr · February 22, 2018, 6:05pm

If I am comparing two models using compare() of the loo package, how should I interpret the SE values.
I compare three models A, B, C and the compare() output is:

compare(looB, looA)
elpd_diff se
-431.1 84.7

compare(looC, looB)
elpd_diff se
42.3 14.3

How can I interpret the SE values of these output?

avehtari · February 22, 2018, 9:43pm

?loo::compare says

The difference will be positive if the expected predictive accuracy for the second model is higher.

and

To compute the standard error of this difference we can use a paired estimate to take advantage of the fact that the same set of N data points was used to fit both models. These calculations should be most useful when N is large, because then non-normality of the distribution is not such an issue when estimating the uncertainty in these sums.

You can read more in [1507.04544] Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

dylancraven · September 26, 2018, 3:34pm

one related question. If SE is bigger than the difference between models, then one would say that the models are similar (and vice versa)?

avehtari · September 26, 2018, 4:35pm

First, instead of SE, it’s better to consider something like 2SE or more cautious 4SE, where 4 comes from the fact that SE for LOO can be underestimated for small n or under bad model misspecification. Second, the models can be very different and the predictions can be very different, it’s just that the average predictive accuracies are close to each other. Third, SE describe uncertainty, so if SE is large then it’s likely that the models do have big difference in predictive accuracy, but we don’t know whether the difference is negative or positive.

dylancraven · September 27, 2018, 8:27am

Thanks - this is very helpful advice!

Marimuthu · March 27, 2024, 1:42am

Hi,

Could you share reference for using 2SE or 4SE if any?

Marimuthu

avehtari · March 27, 2024, 9:06am

For reliability of SE estimate see

It is recommended that you report the difference and SE, and possibly the probability of one model being better than the other, but there is no single recommended threshold as the full information is better and the possible model choice you make depends on the context.

Marimuthu · March 27, 2024, 4:37pm

Thank you. It will be useful.

Marimuthu

Topic		Replies	Views
Loo comparison in reference to standard error General loo	10	3245	May 1, 2018
Interpreting output of multiple comparisons using loo Modeling loo , interpret-results	3	911	October 3, 2018
If elpd_diff/se_diff > \|2\|, is this noteworthy? brms techniques , loo , cross-validation	21	4397	April 3, 2021
Quick examples of loo() interpretation Modeling loo	11	2148	July 3, 2020
Quantifying Uncertainty with the LOO-CV criterion Modeling techniques , fitting-issues , specification , loo	10	232	March 31, 2025

Interpreting output from compare() of loo

Related topics