Interpreting elpd_diff - loo package

avehtari · August 21, 2017, 10:43am

Question is a straightforward, but unfortunately the answer is not.

Short answer: The difference of 1 SE is definitely too small. If n is not small, and there is no bad observation model misspecification (ie. you have made model checking), and there are no khats>0.7, then as a rule of thumb, I would say that difference of 5 SE starts to be on the safe side.

A bit longer answer: Firstly, we need to have all PSIS khats<0.7, so that Monte Carlo error is not dominating (in the forthcoming version of loo package, we’ll provide also an estimate for this Monte Carlo error). Secondly, we know that SE estimate for a single model is optimistic with small n and in case of model misspecification (Grandvalet and Bengio, 2004). Grandvalet and Bengio (2004) show theoretically that true SE is less than 2 times the estimate. There is no similar result for model comparison, but we could assume it would be similar (we are researching this). The problem is further complicated as the uncertainty in the comparison is not necessarily well described by normal distribution with some SE, and especially for small n it would be better to take into account skewness and kurtosis, but it’s not so easy. We are researching ways to improve SE estimate and improve calibration of loo estimates. While you wait for new research results (and a better reference to cite), I would then suggest using 5 x SE, where I picked 5 as 2 x 2.5, where 2.5 would correspond to 99% interval, and 2 is the upper limit of error given by Grandvalet & Bengio (2004).

Instead of difference and se, you could also compute Bayesian stacking weights ([1704.02030] Using stacking to average Bayesian predictive distributions and soon available in loo package), and if the weight of a model is 0, it is worse than the models with positive weight.

Aki

Topic		Replies	Views
Loo comparison in reference to standard error General loo	10	3083	May 1, 2018
Interpreting output from compare() of loo Modeling loo , interpret-results	7	4200	March 27, 2024
Quick examples of loo() interpretation Modeling loo	11	1849	July 3, 2020
Quantifying Uncertainty with the LOO-CV criterion Modeling techniques , fitting-issues , specification , loo	10	147	March 31, 2025
SE of elpd_loo - loo package Modeling loo	2	788	July 6, 2018

Interpreting elpd_diff - loo package

Related topics