Quantifying Uncertainty with the LOO-CV criterion

Hi everyone, I have a very brief, perhaps dumb question: when looking at the output of the LOO-CV, how do I interpret the magnitude of results? Meaning, perhaps a normal model performs worse than a mixture, but by a “small” margin (either in difference in expected point-wise likelihood or LOO Information Criterion) and so I might stick with the normal model, as it has a simpler interpretation of the parameters. How do I know, in a less arbitrary way, what constitutes a “small” difference and what is a “significant” difference?

Are there any rules of thumb at least? Would it be a terrible sin to construct asymptotically normal confidence intervals for the difference in expected point-wise likelihood? (@avehtari)

This is going to have a lot of the information you’re looking for:

2 Likes

Thank you @sjp, from my reading of “How to interpret in Standard error (SE) of elpd difference (elpd_diff)?” what I am doing should be (approximately) correct.

1 Like

I guess I should add the main points from the paper [2008.10296] Uncertainty in Bayesian Leave-One-Out Cross-Validation Based Model Comparison (which is now just mentioned there) to that FAQ answer. That paper gives the conditions when you can trust the normal approximation. We recommend reporting the difference and standard error for the difference and possibly the probability that one model is better than another. If the difference is small also in magnitude (and not just in comparison to diff SE) it doesn’t matter which model you choose as they have similar predictive performance.

How would you calculate that? It is not directly in the output of the loo_compare.
Would the output of loo_model_weights() be helpful?

What if I computed the Akaike weights this way?

delta_looic <- looic - max(looic)  # Difference from best model
model_probs <- exp(delta_looic / 2) / sum(exp(delta_looic / 2))  # Normalize

No. See Using Stacking to Average Bayesian Predictive Distributions (with Discussion) for explanation of the methods that can be used to compute weights with loo_model_weights().

loo_compare() gives you elpd_diff and diff_se. If the conditions mentioned in [2008.10296] Uncertainty in Bayesian Leave-One-Out Cross-Validation Based Model Comparison hold, you can use these with pnorm() to compute the probability.

That is the same as loo_model_weights(..., methdod="pseudobma")

Also note that looic is just -2*elpd_loo and that -2 is just a silly historical relic, which I think we should drop from loo package output. See also CV-FAQ 22 How are LOOIC and elpd_loo related? Why LOOIC is -2*elpd_loo?

Got it, hence the 95% CI using the normal approximation would also work in this case right? I feel they might be more familiar to the readers of the paper. What you are suggesting is using pnorm(elpd_loo$Estimate, mean = 0, sd =elpd_loo$SE) instead?

Sorry if I am slightly slow to follow.

Assuming the conditions hold. 95% CI (credible interval) based on the normal approximation contains the same information as elpd_diff and diff_se, although it requires a bit more thinking to get the probability of one model being better than the other. The danger in reporting the result as 95% CI is that someone might erroneously to use it for hypothesis testing with the null hypothesis being that elpd_diff=0, but it’s wrong as the probability of elpd_diff=0 is 0. Thus, if you report 95% CI, it is good to report also the probability that one model is better than the other.

I assume it’s not your fault, as it usually takes some time to unlearn what you have been taught about frequentist null hypothesis testing, and I wasn’t making it easy as I didn’t use that language.

2 Likes

Sorry @avehtari, but I just want to double check I understood how you would compute the probability that one model is better than the other, would it be something like:

CDF_{N(0, se_{elpd})}(0) - CDF_{N(0, se_{elpsd})}(\hat{elpd})
pnorm(0, mean = 0, sd =elpd_loo$SE) - pnorm(elpd_loo$Estimate, mean = 0, sd =elpd_loo$SE)

Check the example in Section 15 of Nabiximols treatment efficiency.

You can also watch me explaining why we need to examine the distribution of the pairwise differences in Lecture 9.1 in my Bayesian Data Analysis course

1 Like

Thank you very much @avehtari, the post is very clear. I know get what you were saying. Thanks for all the references, really helpful material.

1 Like