Hi everyone, I have a very brief, perhaps dumb question: when looking at the output of the LOO-CV, how do I interpret the magnitude of results? Meaning, perhaps a normal model performs worse than a mixture, but by a “small” margin (either in difference in expected point-wise likelihood or LOO Information Criterion) and so I might stick with the normal model, as it has a simpler interpretation of the parameters. How do I know, in a less arbitrary way, what constitutes a “small” difference and what is a “significant” difference?
Are there any rules of thumb at least? Would it be a terrible sin to construct asymptotically normal confidence intervals for the difference in expected point-wise likelihood? (@avehtari)
Thank you @sjp, from my reading of “How to interpret in Standard error (SE) of elpd difference (elpd_diff)?” what I am doing should be (approximately) correct.
I guess I should add the main points from the paper [2008.10296] Uncertainty in Bayesian Leave-One-Out Cross-Validation Based Model Comparison (which is now just mentioned there) to that FAQ answer. That paper gives the conditions when you can trust the normal approximation. We recommend reporting the difference and standard error for the difference and possibly the probability that one model is better than another. If the difference is small also in magnitude (and not just in comparison to diff SE) it doesn’t matter which model you choose as they have similar predictive performance.
Got it, hence the 95% CI using the normal approximation would also work in this case right? I feel they might be more familiar to the readers of the paper. What you are suggesting is using pnorm(elpd_loo$Estimate, mean = 0, sd =elpd_loo$SE) instead?
Assuming the conditions hold. 95% CI (credible interval) based on the normal approximation contains the same information as elpd_diff and diff_se, although it requires a bit more thinking to get the probability of one model being better than the other. The danger in reporting the result as 95% CI is that someone might erroneously to use it for hypothesis testing with the null hypothesis being that elpd_diff=0, but it’s wrong as the probability of elpd_diff=0 is 0. Thus, if you report 95% CI, it is good to report also the probability that one model is better than the other.
I assume it’s not your fault, as it usually takes some time to unlearn what you have been taught about frequentist null hypothesis testing, and I wasn’t making it easy as I didn’t use that language.
Sorry @avehtari, but I just want to double check I understood how you would compute the probability that one model is better than the other, would it be something like: