Cross-validation FAQ

I have a question relating to Topic 4 (how is CV related to overfitting). In Regression and Other Stories (p. 208–9), the authors state:

[…] median LOO R^2, at 0.17, is much lower than the median Bayesian R^2 of 0.31, indicating overfitting.

Given that the model in question was fit to real data (whose DGM we don’t know) rather than fake data (whose DGM we would know), I don’t see how the authors know that the gap between self-predictive R^2 and LOO R^2 represents overfitting. Doesn’t it just represent model instability? Isn’t it perfectly possible that the sample, despite being quite small relative to the number of parameters estimated, is perfectly representative of the DGM, i.e. that the uncrossvalidated posterior means are unstable but correct?

This might just be semantics. I would say that it’s perfectly possible that an overfitted model happens by chance to land near the correct parameter estimates. The hallmark of overfitting is not the incorrectness but rather the instability itself (which makes it impossible to be confident of the correctness).

1 Like

There was no self-predictive R^2 computed, but posterior-predictive R^2. See Vehtari and Ojanen (2012) for the definitions.

The gap between posterior-predictive and LOO-predictive performance estimates is always due to the fitting to the specific data. We want some fitting to the data or otherwise using data would not make sense. In this case, the gap is big compared to the modellers experience that the modeller has been certain that there is more fitting to the data than necessary. The overfitting is confirmed by figure 12.11 (the first printing had wrong subplots), where we can see that the first prior is strongly favoring higher R^2 values and thus favoring overfitting. With 2nd and 3rd prior, the prior is more flat around the region of substantial likelihood, and posterior and LOO criteria are close to each other. In these cases, Bayesian R^2 larger than 0.31 is still likely and such a good predictive performance is not improbable.

The mean of a performance estimate with bigger bias (here posterior-R^2) can be closer to the true future predictive performance than one with a smaller bias (here LOO-R^2) just by chance.

1 Like