Are differences as large as what I saw in bayes_R2() expected? Is there a reason why comparing bayes_R2 between models with different priors doesn’t make sense? Is there a reason why the difference in loo_R2() appears to be smaller?

It evaluates the models using the same data that were conditioned on, which distorts things a little bit in small samples.

Because it estimates the error for the i-th observation using all but the i-th observation, so it is much less prone to the previously mentioned problem.

Thank you. I haven’t looked at the algebra behind the loo_R2() yet, but I understand (at least I understand in principle), elpd_loo and the like. My understanding is that elpd_loo is conceptually similar to (if not the same thing) as the average conditional predictive ordinate that Gelfand and Dey suggested 10 or 20 years ago. Does loo_R2() use CPOs (or an analog) or is it based on the predictions themselves, making it analogous to Predictive Error Sum of Squares (PRESS) from the 70s or 80s?

> loo(fit)
Computed from 4000 by 151 log-likelihood matrix
Estimate SE
elpd_loo -211.1 11.1
p_loo 14.2 2.3
looic 422.2 22.2
------
Monte Carlo SE of elpd_loo is NA.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 150 99.3% 478
(0.5, 0.7] (ok) 0 0.0% <NA>
(0.7, 1] (bad) 1 0.7% 63
(1, Inf) (very bad) 0 0.0% <NA>
See help('pareto-k-diagnostic') for details.
Warning message:
Found 1 observation(s) with a pareto_k > 0.7. We recommend calling 'loo' again with argument 'k_threshold = 0.7' in order to calculate the ELPD without the assumption that these observations are negligible. This will refit the model 1 times to compute the ELPDs for the problematic observations directly.

Re- running with k_threshold = 0.7

> loo(fit, k_threshold = 0.7)
1 problematic observation(s) found.
Model will be refit 1 times.
Fitting model 1 out of 1 (leaving out observation 102)
Computed from 4000 by 151 log-likelihood matrix
Estimate SE
elpd_loo -211.1 11.1
p_loo 14.1 2.3
looic 422.1 22.2
------
Monte Carlo SE of elpd_loo is 0.2.
All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

> loo(fit_hs)
Computed from 4000 by 151 log-likelihood matrix
Estimate SE
elpd_loo -213.0 10.4
p_loo 8.1 1.5
looic 426.1 20.9
------
Monte Carlo SE of elpd_loo is 0.1.
All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

There are 14 parameters in the regression (12 covariates plus the intercept and sigma), which is close to p_loo in the model with default priors and quite a bit larger than p_loo in the model with regularized horseshoe priors.

It’s hard to believe that it’s been almost 30 years since Gelfand and Dey, but come to think of it, I remember Dipak telling me about CPOs about 15 years ago, and the paper was 10-15 years old then!

Thanks for your help. I’ve seen the Github page before. It’s very useful.

Thank you for the very helpful advice. After reading through https://avehtari.github.io/bayes_R2/bayes_R2.html again, I think I have a good handle on why it’s not surprising that bayes_R2() and loo_R2() give rather different results. Let me see if I’ve got this right. For a linear regression

bayes_R2() uses (\sigma^2)^{(s)} rather by \frac{1}{n-1}\sum_n (y_i - \hat y_i^{s})^2 as its estimate of the residual variance at iteration s, where \hat y_i^{(s)} is the estimate of y_i at iteration s.

loo_R2() uses (\sigma^2)^{(s)} rather by \frac{1}{n-1}\sum_n (y_i - \hat y_{loo,i}^{s})^2 as its estimate of the residual variance at iteration s, where \hat y_{loo,i}^{(s)} is the leave-one-out estimate of y_i at iteration s.

We expect loo_R2() < bayes_R2(), because the residuals in loo_R2() are deviations from an estimate that doesn’t include the observation rather than deviations from an estimate that does.