Edit: what is written here is not entirely correct. See below for more.
LOO-CV can be expected to work as long as:
It passes itās own diagnostics (Pareto-Kās not too high)
The points are conditionally independent, such that the likelihood is the product of the pointwise likelihoods.
At least one candidate model is reasonably well specified.
Thus, if your data arenāt identically distributed, but your models assume that they are identically distributed, then you might run afoul of the final bullet point above. On the other hand, if your models are sufficiently flexible to capture (a good approximation to) the generative process, then youāll be fine. However, when data are not identically distributed, Iād hazard the guess that in general thereās a stronger possibility of strongly influential points in the analysis. These might make it harder for LOO-CV to work, but should be picked up by the Pareto K diagnostics mentioned in bullet 1 above. So in your position Iād prepare for the possibility that LOO-CV might not work too well. A fallback that should more reliably work (provided that bullet points 2 and 3 above are satisfied) is to do brute-force k-fold cross-validation.
Also tagging @avehtari In case Iāve gotten ahead of myself in anything Iāve said!
But Iāam still unsure if cross validation can be applied at all.
I am intrigued what you think about the following points:
As for example stated by Sumio Watanabe (last sentence of the first paragraph of section ā2.1 Definitions of statistical inferenceā):
āThe cross validation procedure needs the i.i.d. condition, whereas information criteria can be used in several not i.i.d. cases as shown in Watanabe (2021).ā
The citation āWatanabe (2021)ā refers to the following paper, which is about WAIC for mixture models.
From which I conclude that WAIC is applicable in case of independent but not identically distributed data.
Is this conclusion correct?
A key point which isnāt elaborated in detail in the post linked above is the difference between āexchangeablilityā and āi.i.d.ā. But just as an example, Iām reasonably certain that itās unproblematic to apply LOO-CV to distributional regression models.
Does the assumption y_i ~ N(mu_i, sigma_i), automatically implies non identically distributed data.
Or does identically distributed in context of the data generating mechanism (see for example example.pdf given in the first post) means that the data points are identically distributed according to the unknown, underlying true distribution.
y_i ~ N(mu_i, sigma) implies that the residuals are identically distributed. y_i ~ N(mu_i, sigma_i)implies that the scaled residuals are identically distributed.
In either case, we might be willing to treat the data points as exchangeable if and only we are willing to assume that the distribution of covariates x_i that yield the predictions for \mu_i and \sigma_i is adequately described by the sample of points in our dataset. That is, we can assume that the pairs (x_i, y_i) are exchangeable. But to do this, and to use LOO to predict the performance on future data, we need to additionally assume that future samples from the joint distribution for (x_i, y_i) will look similar to the observed distribution in our data. And this means that we need to assume not only that the true generative model for p(y_i|x_i) will wonāt change (this is an implicit assumption that is baked into just about any assessment of predictive performance), but also that the observed distribution of x_i is a good approximation to the future distribution of the new x_i that we would like to predict.
The observations donāt need to be identically distributed
LOO can be useful for internal model consistency check even without exchangeability assumption
If LOO is used to estimate the future predictive performance, then we need to assume some exchangeability between the past data and the future data. The simplest form is conditionally i.i.d. but itās not required.
We can assume that the data generating mechanism is changing, but then we need to model that that change, and itās possible to combine such model with LOO-CV. This is rarely done, but there are some examples.