bayes_R2, loo_R2 and hs() priors

kholsinger · October 28, 2020, 1:06pm

I am exploring a fairly straigthfoward multiple regression model

fit <- stan_glm(Photo ~ WD + Ks + KL_Units + BW_dry_avg + Thick_cm + Area_cm2 + LMA_g.cm2 + 
                LWR + LD_g.cm3 + Length_mm_Top + Density_mm2_Top + SPI_Top,
                data = dat,
                family = gaussian(),
                refresh = 0)

                Median MAD_SD
(Intercept)      0.000  0.080
WD               0.058  0.104
Ks               0.109  0.114
KL_Units        -0.048  0.114
BW_dry_avg      -0.078  0.181
Thick_cm        -0.407  0.622
Area_cm2         0.021  0.168
LMA_g.cm2        0.144  0.577
LWR             -0.337  0.172
LD_g.cm3         0.140  0.423
Length_mm_Top   -1.820  0.566
Density_mm2_Top -1.021  0.343
SPI_Top          0.930  0.327

Since the number of samples is fairly small for that number of covariates (151), I also ran the model using regularized horseshoe priors

n <- nrow(dat)
D <- ncol(fit_default_df) - 2
p0 <- 3
tau0 <- p0/(D - p0) * 1/sqrt(n)
prior_coeff <- hs(global_scale = tau0, slab_scale = 1)

fit_hs<- stan_glm(Photo ~ WD + Ks + KL_Units + BW_dry_avg + Thick_cm + Area_cm2 + LMA_g.cm2 + 
                  LWR + LD_g.cm3 + Length_mm_Top + Density_mm2_Top + SPI_Top,
                  data = dat,
                  family = gaussian(),
                  refresh = 0,
                  prior = prior_coeff)

                Median MAD_SD
(Intercept)      0.000  0.079
WD               0.000  0.018
Ks               0.000  0.018
KL_Units         0.000  0.018
BW_dry_avg      -0.004  0.022
Thick_cm        -0.038  0.066
Area_cm2         0.000  0.020
LMA_g.cm2       -0.003  0.025
LWR             -0.097  0.144
LD_g.cm3         0.028  0.055
Length_mm_Top   -0.026  0.055
Density_mm2_Top  0.000  0.022
SPI_Top         -0.003  0.024

So far so good. I was curious to see how well (in an R^2 sense) the regularized horseshoe priors did compared to the default priors.

median(bayes_R2(fit))
[1] 0.237
median(bayes_R2(fit_hs))
[1] 0.065

(I truncated the output to 3 decimals.) Interestingly, loo_R2() shows much smaller differences:

median(loo_R2(fit))
[1] 0.058
median(loo_R2(fit_hs))
[1] 0.027

Are differences as large as what I saw in bayes_R2() expected? Is there a reason why comparing bayes_R2 between models with different priors doesn’t make sense? Is there a reason why the difference in loo_R2() appears to be smaller?

Kent

bgoodri · October 28, 2020, 3:59pm

For small datasets, perhaps.

It evaluates the models using the same data that were conditioned on, which distorts things a little bit in small samples.

Because it estimates the error for the i-th observation using all but the i-th observation, so it is much less prone to the previously mentioned problem.

kholsinger · October 28, 2020, 4:39pm

Thank you. I haven’t looked at the algebra behind the loo_R2() yet, but I understand (at least I understand in principle), elpd_loo and the like. My understanding is that elpd_loo is conceptually similar to (if not the same thing) as the average conditional predictive ordinate that Gelfand and Dey suggested 10 or 20 years ago. Does loo_R2() use CPOs (or an analog) or is it based on the predictions themselves, making it analogous to Predictive Error Sum of Squares (PRESS) from the 70s or 80s?

bgoodri · October 28, 2020, 5:41pm

The code for it is barely 50 lines long

github.com

stan-dev/rstanarm/blob/master/R/bayes_R2.R#L85


    r_squared <- var_mu_pred / (var_mu_pred + sigma2)
    return(r_squared)
  }


#' @rdname bayes_R2.stanreg
#' @aliases loo_R2
#' @importFrom rstantools loo_R2
#' @export
#' 
loo_R2.stanreg <- function(object, ...) {
  if (!used.sampling(object))
    STOP_sampling_only("loo_R2")
  if (is_polr(object))
    stop("loo_R2 is not available for stan_polr models.")
  
  fam <- family(object)$family
  if (!fam %in% c("gaussian", "binomial")) {
    stop("loo_R2 is only available for Gaussian and binomial models.")
  }

I think you would say that it is more similar to PRESS.

kholsinger · October 28, 2020, 6:08pm

Thank you. This is extremely helpful. (And thank you for your patience.)

avehtari · October 29, 2020, 4:37pm

They are believable. If you show loo(fit) and loo(fit_hs) full output , I can say more.

Also instead of median, try plotting histograms, e.g.

ggplot() + geom_histogram(aes(x=bayes_R2(fit)), breaks=seq(0,1,length.out=100)) +
  xlim(c(0,1)) +
  scale_y_continuous(breaks=NULL) +
  labs(x="Bayesian R^2", y="")

and

ggplot() + geom_histogram(aes(x=loo_R2(fit)), breaks=seq(0,1,length.out=100)) +
  xlim(c(0,1)) +
  scale_y_continuous(breaks=NULL) +
  labs(x="LOO R^2", y="")

It’s the same but with more robust computation and with self diagnostic (and it’s almost 30 years).

See also Bayesian R2 and LOO-R2

kholsinger · October 29, 2020, 8:14pm

The histograms look quite different from one another.

Here’s bayes_R2(fit)

and here’s loo_R2(fit)

> loo(fit)

Computed from 4000 by 151 log-likelihood matrix

         Estimate   SE
elpd_loo   -211.1 11.1
p_loo        14.2  2.3
looic       422.2 22.2
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     150   99.3%   478       
 (0.5, 0.7]   (ok)         0    0.0%   <NA>      
   (0.7, 1]   (bad)        1    0.7%   63        
   (1, Inf)   (very bad)   0    0.0%   <NA>      
See help('pareto-k-diagnostic') for details.
Warning message:
Found 1 observation(s) with a pareto_k > 0.7. We recommend calling 'loo' again with argument 'k_threshold = 0.7' in order to calculate the ELPD without the assumption that these observations are negligible. This will refit the model 1 times to compute the ELPDs for the problematic observations directly.

Re- running with k_threshold = 0.7

> loo(fit, k_threshold = 0.7)
1 problematic observation(s) found.
Model will be refit 1 times.

Fitting model 1 out of 1 (leaving out observation 102)

Computed from 4000 by 151 log-likelihood matrix

         Estimate   SE
elpd_loo   -211.1 11.1
p_loo        14.1  2.3
looic       422.1 22.2
------
Monte Carlo SE of elpd_loo is 0.2.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

> loo(fit_hs)

Computed from 4000 by 151 log-likelihood matrix

         Estimate   SE
elpd_loo   -213.0 10.4
p_loo         8.1  1.5
looic       426.1 20.9
------
Monte Carlo SE of elpd_loo is 0.1.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

There are 14 parameters in the regression (12 covariates plus the intercept and sigma), which is close to p_loo in the model with default priors and quite a bit larger than p_loo in the model with regularized horseshoe priors.

It’s hard to believe that it’s been almost 30 years since Gelfand and Dey, but come to think of it, I remember Dipak telling me about CPOs about 15 years ago, and the paper was 10-15 years old then!

Thanks for your help. I’ve seen the Github page before. It’s very useful.

Kent

avehtari · October 30, 2020, 2:06pm

But not surprisingly different.

loo diagnostics look good, so I’d say you can trust LOO-R2 results.

kholsinger · October 31, 2020, 6:48pm

Thank you for the very helpful advice. After reading through https://avehtari.github.io/bayes_R2/bayes_R2.html again, I think I have a good handle on why it’s not surprising that bayes_R2() and loo_R2() give rather different results. Let me see if I’ve got this right. For a linear regression

bayes_R2() uses (\sigma^2)^{(s)} rather by \frac{1}{n-1}\sum_n (y_i - \hat y_i^{s})^2 as its estimate of the residual variance at iteration s, where \hat y_i^{(s)} is the estimate of y_i at iteration s.
loo_R2() uses (\sigma^2)^{(s)} rather by \frac{1}{n-1}\sum_n (y_i - \hat y_{loo,i}^{s})^2 as its estimate of the residual variance at iteration s, where \hat y_{loo,i}^{(s)} is the leave-one-out estimate of y_i at iteration s.

We expect loo_R2() < bayes_R2(), because the residuals in loo_R2() are deviations from an estimate that doesn’t include the observation rather than deviations from an estimate that does.

Topic		Replies	Views
loo_R2: documentation and comparison to bayes_R2() brms loo	5	2097	September 13, 2022
LOO and bayes_R2 (seem to) contradict posterior predictive check Modeling loo , posterior-predictive , brms	14	1304	October 14, 2022
R2 vs bayes_R2 General	8	877	January 25, 2020
Both bayes_R2 and loo_R2 get weird estimate 1 General brms	3	444	December 2, 2022
Model comparsion for linear regression using loo and Bayesian R2 Modeling techniques , loo	0	436	October 18, 2022

bayes_R2, loo_R2 and hs() priors

Related topics