R2 vs bayes_R2

I have estimated stan_lm models using rstanarm.

One of the estimates is labelled R2.

There is also the bayes_R2.stanreg function.

Both the single R2 estimate (with mean, sd, percentiles, etc) and the bayes_R2 vector of estimates come from the posterior draws.

I don’t understand the distinction/purposes of the two R2’s.

I haven’t found similar questions in the forum.

Nathan

I don’t know the difference, but check here: http://hbiostat.org/papers/rms/accuracy/bayes/gel18r2.pdf

The bayes_R2 function can be called on any generalized linear model (stan_glm), even those with group-specific parameters (stan_glmer). For the particular function stan_lm, the R^2 is a primitive parameter whose posterior distribution is being estimated, whereas bayes_R2 is essentially a generated quantity rather than a primitive parameter. Either way, they are referring to the same concept and should have the same distribution theoretically.

1 Like

If you find bayes_R2 useful and want to report it in your work, I suggest you to try loo_R2 too, which tends to suffer less from potential overfitting. See https://avehtari.github.io/bayes_R2/bayes_R2.html for more info.

1 Like

Thanks for the several comments. I have looked at the Am Stat paper with supplement.

My stan_lm model is log(continuous response) ~ 33 predictors (15 covariates and factors); the posterior sample size is 32000.

The “primitive” parameter R2 is 0.4 with 90% CI (0.4, 0.4), sd 0.0, mcse 0.0 and Rhat 1.001.

I attempted to use bayes_R2 and loo_R2 on the fitted object.

I am working in a container with 120 Gb on a linux server running R 3.6.1 and rstanarm 2.19.2.

Both the bayes_R2 and the loo_R2 calls returned:

Error: cannot allocate vector of size 263.3 Gb.

I will use the R2 reported with the fit.

Is the bayes_R2 error expected with the posterior sample size of my fit?

Nathan

Could you try running again bayes_R2 and type traceback() immediately after the error? I’d like to see where exactly this allocation is being attempted.

StandardizedOME.stan_lm.5c.R2 ←

  • bayes_R2(StandardizedOME.stan_lm.5c)
    Error: cannot allocate vector of size 263.3 Gb

traceback()
7: linear_predictor.matrix(beta, x, data$offset)
6: linear_predictor(beta, x, data$offset)
5: pp_eta(object, data = dat, draws = draws)
4: posterior_linpred.stanreg(object, transform = TRUE, re.form = re.form)
3: posterior_linpred(object, transform = TRUE, re.form = re.form)
2: bayes_R2.stanreg(StandardizedOME.stan_lm.5c)
1: bayes_R2(StandardizedOME.stan_lm.5c)

Here it is for loo_R2

StandardizedOE.stan_lm.5c ←

  • loo_R2(StandardizedOME.stan_lm.5c)
    Error: cannot allocate vector of size 263.3 Gb

traceback()
5: vapply(seq_len(args$N), FUN = function(i) {
as.vector(fun(data_i = args$data[i, , drop = FALSE], draws = args$draws))
}, FUN.VALUE = numeric(length = args$S))
4: log_lik.stanreg(object)
3: log_lik(object)
2: loo_R2.stanreg(StandardizedOME.stan_lm.5c)
1: loo_R2(StandardizedOME.stan_lm.5c)

That is all you need. The bayes_R2 is no better of an estimate and loo_R2 tends to differ only with small datasets. You might print yourself a second decimal place by doing print(fit, digits = 2).

2 Likes