Hello out-of-sample predictive accuracy experts,
A long time ago, when I came across this line in an unpublished manuscript from Vehtari and Gelman that was formerly archived on stat.columbia.edu (traces attesting to the existence of said manuscript can now only be found on CiteSeer) that highlighted a difference between the WAIC and LOO prediction tasks at a high level:
In practice, when there is a difference between WAIC and LOO as here with large data scaling factors, the modeler should decide whether the goal is predictions for new schools or for these same eight schools in a hypothetical replicated experiment. In the first case the modeler should use LOO and in the second case the modeler should use WAIC.
When I read this years ago in my initial exposure to Bayesian predictive accuracy metrics, I found this to be a intuitive line that helped me process what was going on. However, as I now revisit this, I’m having some trouble zeroing in on which equations and expressions in Gelman et al., 2014 and Vehtari et al., 2017 source the above intuition. In Gelman et al., 2014, I see that the first three terms of the WAIC Taylor expansion match that of LOO, and I’m not sure how the different fourth terms alter the practical purposes of each metric. I’d appreciate it if someone could point me to relevant equations in those or other publications. Thanks!
Edit: supposing I should tag @avehtari for this.