Advice about LOO

Hi all (especially @avehtari).

I’d be curious to hear thoughts on the following situation. I am writing a paper that improves on a prominently published model by adding an additional covariate. My prior belief is that adding the covariate is certain to improve the predictive performance of the model, with probability 1. Still, for purposes of presentation, I would like to show evidence that the new model predicts better. Problem is, the dataset is fairly large, the model is a bit slow, LOO returns a large number of bad Pareto K’s, and moment matching has run for several days and shows no sign of stopping. The good news is that LOO without moment matching (and with lots of bad Pareto K’s) shows a very large improvement as predicted. Like elpd_diff = -370 and se_diff = 25.

How would you proceed in such a case? Doing 10-fold CV is an option here, but I’d prefer to avoid it since it would take a week or so and it’s a lot of computation just to tell me what I already know; i.e. that the new model has better predictive performance. Is there a point where the difference is so stark that it’s acceptable to just report elpd_diff and its standard error while acknowledging the bad Pareto K’s? If so, would you say that I have reached that point?

Can you show the loo output? It will be easier to make recommendation if I can see also the total number of parameters, p_loo, and the distribution of k-values in different ranges.

Have you done any posterior predictive checking? That is, is there possibility that high k-values are due to the model misspecification (alternative being that they are very flexible models)?

1 Like