Extract index of influential observations in loo() - brms

Please also provide the following information in addition to your question:

  • Operating System: Windows 10
  • brms Version:2.8

I fitted a nonlinear mixed-effects model, however, when I run the loo() function with it
I get a high number of influential observations - around 100. I would like to know whether there is any way to find out which these observations are, as it would help me better understand why the high number of influential observations might be happening in the first place.

I know that when the reloo option is used, when the models are refitted it says which observation the model is refitted for, but I was wondering if there is another way to access this information.

The loo object itself stores the pareto K values (and many other helpful diagnostic quantities) for each observation. To look at the pareto K values, though:

loo_fit <- loo(brms_fit)
k <- loo_fit$diagnostics$pareto_k

There are also plotting methods that allow you to label observations that are above a certain threshold, but with 100 that would be a rather unwieldy way of diagnosing.

To do so:

plot(loo_fit, label_points = TRUE)

With that many observations flagged, though, I would worry about model misspecification.

2 Likes

In addition to what @nerutenbeck said, the loo package has the functions pareto_k_ids() and pareto_k_values() specifically for this purpose.

4 Likes