Stan summary R2 or adjusted R2

I am doing multiple regression in rstan and have the following code

summary(fit, R2 = TRUE)

does R2 represent the adjusted-R^2 or just R^2?

It is closer conceptually to unadjusted.

So should I assume that it does not penalize additional variables? Meaning that the more variables I have the higher the R2 will be?

Yes.

You can compute LOO-adjusted R2 with

looR2 <- function(fit) {
    y <- get_y(fit)
    ypred <- posterior_linpred(fit)
    ll <- log_lik(fit)
    r_eff <- relative_eff(exp(ll), chain_id = rep(1:4, each = 1000))
    psis_object <- psis(log_ratios = -ll, r_eff = r_eff)
    ypredloo <- E_loo(ypred, psis_object, log_ratios = -ll)$value
    eloo <- ypredloo-y
    return(1-var(eloo)/var(y))
}
round(looR2(fit),2)

This will be added some day to some package (hopefully before StanCon Helsinki)

I just realized that since I made that function quickly it has hard coded the number and the length of chains. I’ll change that later, but now it’s too late for me to think.

1 Like

I wasn’t able to edit my post, but add here that for loo-R2 use either

The first two options include Bayesian bootstrap for generating draws presenting the epistemic uncertainty of not knowing the future data distribution. See more in Uncertainty in Bayesian leave-one-out cross-validation based model comparison.

1 Like