I agree different levels of error can be confusing.

This would be the exact future predictive performance (assuming stationarity)

(also, instead of iid, exchangeability is sufficient)

I tried to say, that it matters whether the same computational is used for p(\tilde{y} | y) and p(y_i | y_{-i}). MCMC is not exact and can sometimes have significant pre-asymptotic bias before CLT kicks. If you could compute LOO predictive densities exactly (without MCMC) they may not match that error MCMC can make and thus if you want to estimate the predictive accuracy given, e.g. 4000 MCMC draws to estimate p(\tilde{y} | y) then you would like to compute LOO also with 4000 MCMC draws from each LOO-posterior. There’s an example showing the difference between exact inference and MCMC inference for full posterior predictive density in Implicitly adaptive importance sampling.

Correspondingly if we would use VI for the full posterior and to make predictions, we would like to compute LOO with VI run for each LOO-posterior.

Now if running MCMC for LOO-posteriors is approximated with PSIS there is a mismatch between the methods and there can be additional error.

So there would be three sources of error

- cross-validation error
- Monte Carlo error (other inference error)
- mismatch between using different computation to compute full and LOO-predictive densities

We’re mostly able to estimate the variance part of these errors and we can estimate the bias in 1., but estimating possible bias from 2 and 3 is difficult without a lot of extra computation.

It would easier to just say yes, but the true answer is that we don’t need to in the same way as usually in Bayesian inference we don’t need to assume i.i.d. but exchangeability. We try to say this also in the paper with different words.

Thanks, we’re just revising the paper. Ping @mans_magnusson