I am comparing three different models using LOO. For one of these models, I am a bit uncertain about whether the results can be trusted, given that there are some problematic values. I tried dropping these observations, which did not qualitatively change the results. I would like to know whether keeping them is an acceptable practice.
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume independent draws (r_eff=1).
Pareto k diagnostic values:
Count Pct. Min. ESS
(-Inf, 0.7] (good) 3391 99.7% 347
(0.7, 1] (bad) 4 0.1% <NA>
(1, Inf) (very bad) 5 0.1% <NA>
See help('pareto-k-diagnostic') for details.
It isn’t surprising that dropping these values didn’t qualitatively change the results. The pareto-k diagnostic is telling you in effect that these observations have too much influence on the posterior to remove via the PSIS approximation. So the PSIS approximation isn’t really able to tell you what the contribution of those observations to the elpd is. Without knowing that for sure, formally it is difficult to say with confidence that one model is better than another overall, because one or the other model could be dramatically better (to an unknown extent) for those particular observations–conceivably by enough to outweigh whatever pattern is present in the remainder of the data. Thus, formally, neither dropping them nor keeping them would yield results that are theoretically guaranteed to be even approximately correct. In practice, I haven’t seen many examples of this failing badly, however. To be sure, you could potentially re-fit the model with each problematic observation left out (reloo) to explicitly understand the elpd contribution from each of the challenging observations.
Note that these problematic observations can, in some cases, be indicative of very strongly influential points in the data. Very influential points can, in some cases, be indicative of model mis-specification. So it’s always worth examining these points carefully to see what’s going on.