Quantifying Uncertainty with the LOO-CV criterion

Tommaso11397 · March 26, 2025, 8:48am

Hi everyone, I have a very brief, perhaps dumb question: when looking at the output of the LOO-CV, how do I interpret the magnitude of results? Meaning, perhaps a normal model performs worse than a mixture, but by a “small” margin (either in difference in expected point-wise likelihood or LOO Information Criterion) and so I might stick with the normal model, as it has a simpler interpretation of the parameters. How do I know, in a less arbitrary way, what constitutes a “small” difference and what is a “significant” difference?

Are there any rules of thumb at least? Would it be a terrible sin to construct asymptotically normal confidence intervals for the difference in expected point-wise likelihood? (@avehtari)

sjp · March 26, 2025, 10:37am

This is going to have a lot of the information you’re looking for:

Tommaso11397 · March 26, 2025, 10:59am

Thank you @sjp, from my reading of “How to interpret in Standard error (SE) of elpd difference (elpd_diff)?” what I am doing should be (approximately) correct.

avehtari · March 26, 2025, 7:12pm

I guess I should add the main points from the paper [2008.10296] Uncertainty in Bayesian Leave-One-Out Cross-Validation Based Model Comparison (which is now just mentioned there) to that FAQ answer. That paper gives the conditions when you can trust the normal approximation. We recommend reporting the difference and standard error for the difference and possibly the probability that one model is better than another. If the difference is small also in magnitude (and not just in comparison to diff SE) it doesn’t matter which model you choose as they have similar predictive performance.

Tommaso11397 · March 27, 2025, 7:52am

How would you calculate that? It is not directly in the output of the loo_compare.
Would the output of loo_model_weights() be helpful?

What if I computed the Akaike weights this way?

delta_looic <- looic - max(looic)  # Difference from best model
model_probs <- exp(delta_looic / 2) / sum(exp(delta_looic / 2))  # Normalize

avehtari · March 27, 2025, 8:05am

No. See Using Stacking to Average Bayesian Predictive Distributions (with Discussion) for explanation of the methods that can be used to compute weights with loo_model_weights().

loo_compare() gives you elpd_diff and diff_se. If the conditions mentioned in [2008.10296] Uncertainty in Bayesian Leave-One-Out Cross-Validation Based Model Comparison hold, you can use these with pnorm() to compute the probability.

That is the same as loo_model_weights(..., methdod="pseudobma")

Also note that looic is just -2*elpd_loo and that -2 is just a silly historical relic, which I think we should drop from loo package output. See also CV-FAQ 22 How are LOOIC and elpd_loo related? Why LOOIC is -2*elpd_loo?

Tommaso11397 · March 27, 2025, 8:11am

Got it, hence the 95% CI using the normal approximation would also work in this case right? I feel they might be more familiar to the readers of the paper. What you are suggesting is using pnorm(elpd_loo$Estimate, mean = 0, sd =elpd_loo$SE) instead?

Sorry if I am slightly slow to follow.

avehtari · March 27, 2025, 8:33am

Assuming the conditions hold. 95% CI (credible interval) based on the normal approximation contains the same information as elpd_diff and diff_se, although it requires a bit more thinking to get the probability of one model being better than the other. The danger in reporting the result as 95% CI is that someone might erroneously to use it for hypothesis testing with the null hypothesis being that elpd_diff=0, but it’s wrong as the probability of elpd_diff=0 is 0. Thus, if you report 95% CI, it is good to report also the probability that one model is better than the other.

I assume it’s not your fault, as it usually takes some time to unlearn what you have been taught about frequentist null hypothesis testing, and I wasn’t making it easy as I didn’t use that language.

Tommaso11397 · March 28, 2025, 8:00am

Sorry @avehtari, but I just want to double check I understood how you would compute the probability that one model is better than the other, would it be something like:

CDF_{N(0, se_{elpd})}(0) - CDF_{N(0, se_{elpsd})}(\hat{elpd})

pnorm(0, mean = 0, sd =elpd_loo$SE) - pnorm(elpd_loo$Estimate, mean = 0, sd =elpd_loo$SE)

avehtari · March 28, 2025, 6:59pm

Check the example in Section 15 of Nabiximols treatment efficiency.

You can also watch me explaining why we need to examine the distribution of the pairwise differences in Lecture 9.1 in my Bayesian Data Analysis course

Tommaso11397 · March 31, 2025, 9:08am

Thank you very much @avehtari, the post is very clear. I know get what you were saying. Thanks for all the references, really helpful material.

Topic		Replies	Views
Interpreting elpd_diff - loo package Modeling loo , interpret-results	47	14701	November 9, 2020
Interpreting output from compare() of loo Modeling loo , interpret-results	7	4188	March 27, 2024
Quick examples of loo() interpretation Modeling loo	11	1823	July 3, 2020
If elpd_diff/se_diff > \|2\|, is this noteworthy? brms techniques , loo , cross-validation	21	4034	April 3, 2021
LOO/WAIC Model Difference Credibility Intervals General loo	1	1432	September 4, 2017

Quantifying Uncertainty with the LOO-CV criterion

Related topics