Hi all (especially @avehtari).
I’d be curious to hear thoughts on the following situation. I am writing a paper that improves on a prominently published model by adding an additional covariate. My prior belief is that adding the covariate is certain to improve the predictive performance of the model, with probability 1. Still, for purposes of presentation, I would like to show evidence that the new model predicts better. Problem is, the dataset is fairly large, the model is a bit slow, LOO returns a large number of bad Pareto K’s, and moment matching has run for several days and shows no sign of stopping. The good news is that LOO without moment matching (and with lots of bad Pareto K’s) shows a very large improvement as predicted. Like elpd_diff = -370
and se_diff = 25
.
How would you proceed in such a case? Doing 10-fold CV is an option here, but I’d prefer to avoid it since it would take a week or so and it’s a lot of computation just to tell me what I already know; i.e. that the new model has better predictive performance. Is there a point where the difference is so stark that it’s acceptable to just report elpd_diff
and its standard error while acknowledging the bad Pareto K’s? If so, would you say that I have reached that point?