LOO-CV problem

lg01 · June 20, 2025, 1:43pm

I am comparing three different models using LOO. For one of these models, I am a bit uncertain about whether the results can be trusted, given that there are some problematic values. I tried dropping these observations, which did not qualitatively change the results. I would like to know whether keeping them is an acceptable practice.

MCSE of elpd_loo is NA.
MCSE and ESS estimates assume independent draws (r_eff=1).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     3391  99.7%   347     
   (0.7, 1]   (bad)         4   0.1%   <NA>    
   (1, Inf)   (very bad)    5   0.1%   <NA>    
See help('pareto-k-diagnostic') for details.

jsocolar · June 20, 2025, 10:01pm

It isn’t surprising that dropping these values didn’t qualitatively change the results. The pareto-k diagnostic is telling you in effect that these observations have too much influence on the posterior to remove via the PSIS approximation. So the PSIS approximation isn’t really able to tell you what the contribution of those observations to the elpd is. Without knowing that for sure, formally it is difficult to say with confidence that one model is better than another overall, because one or the other model could be dramatically better (to an unknown extent) for those particular observations–conceivably by enough to outweigh whatever pattern is present in the remainder of the data. Thus, formally, neither dropping them nor keeping them would yield results that are theoretically guaranteed to be even approximately correct. In practice, I haven’t seen many examples of this failing badly, however. To be sure, you could potentially re-fit the model with each problematic observation left out (reloo) to explicitly understand the elpd contribution from each of the challenging observations.

Note that these problematic observations can, in some cases, be indicative of very strongly influential points in the data. Very influential points can, in some cases, be indicative of model mis-specification. So it’s always worth examining these points carefully to see what’s going on.

avehtari · June 23, 2025, 7:19am

Hi,

it would be helpful if you would show the full loo output including p_loo and tel more about the model. See LOO package glossary — loo-glossary • loo how to diagnose whether high Pareto-k’s might be due to model misspecificiation or due to a flexible model. Not also that Pareto-k estimates are based on the finite number of MCMC draws and thus are noisy, and it is possible that just by chance some of the estimates are high. You can try running more iterations to improve the accuracy of the Pareto-k estimates. Although Pareto-k estimates bigger than 1 are usually not just by chance that high.

If you are using brms, you can also try moment matching which is faster than reloo.

Topic		Replies	Views
Mcse in loo 2.0 Modeling loo	12	1015	April 22, 2018
High Pareto-k values for the same observations across different models: Can I still use loo to compare these models? Modeling loo	2	583	November 5, 2018
Up-to-date advice on LOO and high diagnostic values Modeling techniques	7	334	May 23, 2024
Pareto values versus ELPD differences Modeling loo	5	1019	June 15, 2021
Obtain LOO by refitting high pareto values Modeling loo	8	166	November 12, 2024

LOO-CV problem

Related topics