Max Farrell's tutorial says refit model without outliers k>0.7

Regarding finding outliers with pareto k values > 0.7, Max Farrell’s tutorial says:

One or two moderate outliers (k>0.5) shouldn’t have too much of an effect on the model. But, rather than ignoring this, one option is to re-fit the model without these problematic observations, and directly calculate the loo statistic directly for them.
loo1 <- loo(stan_glm1, k_threshold=0.7)
loo2 <- loo(stan_glm2, k_threshold=0.7)
We can then compare models:
compare(loo1, loo2)

What is loo(stan_glm1, k_threshold=0.7) doing? It also looks like k_threshold is no longer an argument for loo. Is this performing loo-cv but without the outlier values? Why is that justified?

1 Like

Hi and welcome to Discourse!

The tutorial you linked is using the loo function from R package rstanarm, not from R package loo. rstanarm provides a nice vignette for how it implements loo here: Information criteria and cross-validation — loo.stanreg • rstanarm

Briefly, what’s happening here is the following:

  • loo is an approximation to leave-one-out cross validation
  • For observations with high pareto k, the approximation might fail
  • One solution is to actually do the leave-one-out procedure for observations with bad pareto k. This is what k_threshold is doing. If it finds an observation with bad pareto k, then instead of relying on PSIS-LOO to estimate the predictive performance if that observation is left out, it actually refits the model with just that observation left out, thus doing explicit leave-one-out cross-validation for these observations that are not well handled by the PSIS-LOO approximation. It repeats this for every observation with bad pareto k.

Thank you! That makes sense. Is this method suitable for and/or easily implementable for more complicated models with e.g. the package loo?

1 Like

The procedure is certainly valid for other models (any time that leave-one-out cross-validation is valid, this procedure should be valid). However, the procedure is largely superseded by the new implementations of loo that include corrections based on moment matching. See here for details

If moment matching is not a good option for you (e.g. if you’ve fit the model in pure Stan using complex constraint transforms that make wrangling the unconstrained draws prohibitive), then it shouldn’t be overly difficult to implement the procedure that rstanarm uses, I think. If you need to go this route, feel free to ping us again on this thread for pointers.


Thanks a lot. However, I can’t get moment matching to work. When I try loo2 <- loo(fit, moment_match = TRUE), I get the error:

Error in .local(object, ...) : 
  the model object is not created or not valid

In the vignette, to get their model fit object, they use the function sampling whereas I use stan, however those links say that both functions should produce an object of S4 class stanfit, and the loo function says it takes in an object of S4 class stanfit, as does the function loo_moment_match but that also gives me the same error.

Update: I have tried refitting the model due to finding this past question Loo moment matching not working - #10 by sergej but now R aborts as soon as I try loo2 <- loo(fit, moment_match = TRUE). This particular model is extremely quick to fit (4 chains of 2000 run in less than a minute), so I don’t see why this would break?

Which package’s “loo” are you using?

I have tried both rstan::loo and loo::loo. I think it appears to work on my laptop now (yet to retry on my desktop which is where it broke previously) by re-running the model fit and including the argument cores=1 - bit annoying that it doesn’t work on model fits I have already run and saved, but hey ho. So after moment matching, now all k < 0.7, so all elpd estimates are probably okay. What does this mean for the other interpretation of k to judge observations that are may be very influential to the posterior? Obviously doing moment matching doesn’t change the model fit, so I assume you still use the original k values from loo to say which observations may bias the data?

Yup. If you are using k to diagnose influential observations, then you would still use the originals. Note that

is not precise language in this context. I can’t tell if I’m being obnoxiously pedantic or usefully clarifying by pointing this out. These observations with high k probably have outsized influence on the model posterior, but that doesn’t mean that the data are biased due to their inclusion. In some scenarios, high k can also be suggestive of model misspecification. For more, see


Thanks very much!

1 Like