Clarification about the purpose of 'reloo' in the loo function

#1

Please also provide the following information in addition to your question:

  • Operating System: Windows 10
  • brms Version:

I have managed to fit several multilevel nonlinear models and now I am trying to compare them, based on the loo criterion.
However, pretty much all my models except for 1 have 3-5 problematic observations each. So R suggested I add the ‘reloo’ option, which would refit the models.
I did that and now for each model all observations have k < 0.7, but I am still not sure I understand how this works. I looked up the R documentation, where it is said that

reloo will refit the original model J times, each time leaving out one of the J problematic observations. The pointwise contributions of these observations to the total ELPD are then computed directly and substituted for the previous estimates from these J observations that are stored in the original loo object.

I find this part confusing

The pointwise contributions of these observations to the total ELPD are then computed directly and substituted for the previous estimates from these J observations that are stored in the original loo object.

What is it meant by ‘computed directly’. Do we just see how ELPD changes when the model is fitted without an observation i, compared to when the model based on all data points is fitted, and take the difference between the two to be obs i’s pointwise contribution?

Also, I don’t think I understand what exactly the purpose of the reloo option is.
Even if we refit the model and find out what the pointwise contributions of the problematic observations are, and compare these refitted models, how does this change the fact that the original models we had have some ‘influential’ observations?

This is, say we have model1 and model2, with model1 having 5 problematic observations and model2 having 7 problematic observations (out of 590).
After refitting the models using the reloo option, model2 turns out to be the prefered one (higher ELPD). But even if loo_compare says it is the prefered model, how does this change the fact that the original model2 has more influential observations?
Doesn’t this mean the original model2 would still do more poorly in predicting, compared to model1?

I hope my question makes sense.

1 Like
#2

Pareto-smoothed importance sampling is a way to approximate the (in your case) leave-one-out posterior. You fit your model once and then for each data point, the loo package re-weights the posterior that you have to approximate the leave-one-out posterior.

This is really awesome because it allows you to check your model on the left-out data point without fitting your model N times, it almost feels like cheating. But like every approximation, it might fail. This happens for “influential” data points, because the importance ratios that loo internally calculates to re-weight your posterior might have infinite variance and ultimately, even trying to smooth these weights does not work. For these cases, it is best to bite the bullet and re-fit the model, because then you get a sample of the “true” leave-one-out posterior and not just samples from the approximated leave-one-out posterior.

Coming to your actual question: elpd is a way to compare models. For each data point, you can use the leave-one-out posterior to calculate how surprised the model is to see the left-out data point (measured by the density at the observed value). This is repeated for every data point and the elpd that loo shows is just the sum of the individual contributions. So what reloo does is use the “true” leave-one-out posterior instead of the approximated one for those points where the approximation is not reliable.

But even if loo_compare says it is the prefered model, how does this change the fact that the original model2 has more influential observations?

Observations with high k are only problematic in the sense of trying to approximate the leave-one-out posterior, there’s nothing inherently bad about them. These points usually influence the posterior stronger than other ones, which in turn means that they also often have a lower elpd than other points, but it still holds true that model2 better predicts your observed data.

edit: I should probably add that many data points with high k can be an indicator of model misspecification (where the definition of many is of course context-dependent). In your case, I wouldn’t be worried at all, but you could check if those points with high k were labelled correctly, sometimes you can catch data-entry errors this way.

7 Likes
#3

This is a very thorough and easy to understand explanation, thanks!

2 Likes