I am doing multiple regression in rstan and have the following code
summary(fit, R2 = TRUE)
does R2 represent the adjusted-R^2 or just R^2?
I am doing multiple regression in rstan and have the following code
summary(fit, R2 = TRUE)
does R2 represent the adjusted-R^2 or just R^2?
It is closer conceptually to unadjusted.
So should I assume that it does not penalize additional variables? Meaning that the more variables I have the higher the R2 will be?
Yes.
You can compute LOO-adjusted R2 with
looR2 <- function(fit) {
y <- get_y(fit)
ypred <- posterior_linpred(fit)
ll <- log_lik(fit)
r_eff <- relative_eff(exp(ll), chain_id = rep(1:4, each = 1000))
psis_object <- psis(log_ratios = -ll, r_eff = r_eff)
ypredloo <- E_loo(ypred, psis_object, log_ratios = -ll)$value
eloo <- ypredloo-y
return(1-var(eloo)/var(y))
}
round(looR2(fit),2)
This will be added some day to some package (hopefully before StanCon Helsinki)
I just realized that since I made that function quickly it has hard coded the number and the length of chains. I’ll change that later, but now it’s too late for me to think.
I wasn’t able to edit my post, but add here that for loo-R2 use either
rstanarm::loo_R2
brms::loo_R2
The first two options include Bayesian bootstrap for generating draws presenting the epistemic uncertainty of not knowing the future data distribution. See more in Uncertainty in Bayesian leave-one-out cross-validation based model comparison.