LOO-R2: With or without Bayesian bootstrap?

In this case study, the LOO-adjusted R^2 uses the Bayesian bootstrap (in addition to the LOO-CV-based predictions \hat{y}_{\text{loo},n}). In contrast, this answer and brms seem to omit the Bayesian bootstrap.

The case study from above suggests (but not explicitly says) that the Bayesian bootstrap accounts for the fact that the true data-generating distribution (for y) is unknown. But couldn’t one argue that the true data-generating distribution is reflected by the observed data? If yes, then wouldn’t the Bayesian bootstrap introduce additional sampling uncertainty (sampling in the frequentist sense of repeating the data observation process)? If this is correct, then wouldn’t it make sense to omit the Bayesian bootstrap (like in the answer linked above and in brms)?

So my question is: Is it incorrect to use the Bayesian bootstrap for the LOO-adjusted R^2 or is the Bayesian bootstrap optional for the LOO-adjusted R^2?

See Uncertainty in Bayesian Leave-One-Out Cross-Validation Based Model Comparison for explanation. That paper uses elpd as the example, but the same about the sources of uncertainty and variation holds for LOO-R2.

1 Like

Thanks for your reply. Do I understand your hint to the paper correctly that the Bayesian bootstrap takes the uncertainty into account which arises when regarding the LOO-adjusted R^2 as a frequentist estimator? In that sense, is it correct that the Bayesian bootstrap would be optional for the LOO-adjusted R^2 (depending on whether or not this frequentist uncertainty should be taken into account)?

No. Bayesian bootstrap as the name says is Bayesian approach. You may get confused as the paper analyses the frequency properties of of Bayesian approach.

Bayesian bootstrap is one way to estimate the lack of knowledge about the future data distribution. If we assume that the future data distribution is the same as where the observed data was generated, then we can use the observed data as a proxy, but as it just a finite number of observations there is uncertainty about the actual distribution. Bayesian bootstrap is equal to having Dirichlet distribution model for the future data. It’s kind of silly simple model, but it makes minimal assumptions and happens to get the first moments right with easy to implement computation. It’s optional in that sense that there are other algorithms for the same task…

Ok, but then this answer and brms (same links as above) do not take that uncertainty into account you are talking about? (I guess we are talking about the same kind of uncertainty, I might just not have expressed it correctly.)

The first answer was a quick first version of the function. LOO-R2 code in online appendix for paper R-squared for Bayesian regression models includes Bayesian bootstrap and rstanarm::loo_R2 does that, too. I don’t know why @paul.buerkner decided to leave it out from brms::loo_R2, but he can comment. I added a note to that old posting about the later implemented functions.

1 Like

No good reason. I will check and then implement it.

Now I see what I had overseen in this case study: For the LOO-adjusted R^2, there are not S different values of \hat{y}_{\text{loo}, n}, unlike for the residual-based (and also for the model-based) Bayesian R^2 where \hat{y}^{s}_{n} depends on s \in \{1, \dots, S\} (with S denoting the number of posterior draws). Thus, it’s clear that the LOO-adjusted R^2 needs a different way to take the finite-sample uncertainty into account. In the case study, I somehow read \hat{y}^{s}_{\text{loo}, n} instead of \hat{y}_{\text{loo}, n} which made me wonder why the finite-sample uncertainty was accounted for twice. Sorry for wasting your time on this.

brms from github now uses Bayesian bootrap in loo_R2.