Can I obtain bayes_R2 for reach level using rstanarm?

techniques

#1

I have a three level hierarchical data structure with normal distributed outcome. I successfully fitted my model using stan_glmer, but need the R2 for the explained variance for each of the levels. In rstanarm, I was only able to get one R2 value for the whole model using bayes_R2.

Is there a way to obtain this for each level, in the way of Gelman’s 2004 “Bayesian measures of explained variance and pooling in multilevel (hierarchical) models” paper?

Alternatively, is there a way to obtain the level-specific R2 from a lme4 fitted model?

Thanks a lot in advance for your help!
Carolina


#2

Not from the bayes_R2 function. But the concept is just

ratio <- apply(posterior_linpred(post, transform = TRUE), 1, FUN = var) / 
               apply(posterior_predict(post), 1, FUN = var))

so you could do the variance calculation over whatever subgroups you want.


#3

Thanks a lot!
I did this but am not getting the expected results, so I must be doing something wrong when trying it subgroup-specific.

For getting the subgroup-specific posterior_linpred and posterior_predict, I did the following (example for highest group level):

post_id <- posterior_predict(fit, re.form = ~ (1|id))
postlin_id <- posterior_linpred(fit, transform = TRUE, re.form = ~ (1|id))

Where fit is my stan_glmer object for the whole model.

Then I computed the R2:
rsquared.id <- 1 - mean (ratio)

The results that I got were way greater than expected.

Hope my mistake is easy to spot.
Thanks a lot for your help!


#4

And ratio is apply(postlin_id, 1, var) / apply(post_id, 1, var)?


#5

Yes, this is the ratio I specified


#6

I found the code for the function bayes_R2.stanreg (https://github.com/jgabry/bayes_R2/blob/master/bayes_R2.tex) to be:

bayes_R2 <- function(fit) {
y <- get_y(fit)
yhat <- posterior_linpred(fit)
e <- -1 * sweep(yhat, 2, y)
var_yhat <- apply(yhat, 1, var)
var_e <- apply(e, 1, var)
var_yhat / (var_yhat + var_e)
}

and was also able to apply it to specific levels/subgroups (same result as bayes_R2(fit,re.form= ~ (1|identr)).

However, these results (for full model and also for separate levels) were different from the R2 using this ratio:

ratio <- apply(posterior_linpred(post, transform = TRUE), 1, FUN = var) /
apply(posterior_predict(post), 1, FUN = var))

According to Gelman 2004 (https://www.econstor.eu/bitstream/10419/142498/1/EERI_RP_2004_04.pdf), R2 is obtained by subtracting this ratio from 1. However, results are not consistent.

Am I measuring different things here? and how can I interpret them? Which method would be the correct one for calculating the expalined variance by levels in a hierarchical model?

Thanks for your help!


#7

The question of what is the R^2 within a subgroup is a bit different from the question how much variability does a term add to the model (which is what I think you are asking now). You can do as.data.frame(fit) to get the posterior distribution of the standard deviation in the intercepts across levels of identr and compare that to the standard deviation of the measurement errors for example. In more complicated models than this, it can get tricky, which is why I always suggest just doing things the Bayesian way

PPD_full <- posterior_predict(fit) # what the full model predicts
var_PPD_full <- apply(PPD_full, 1, var) # the variance over the observations
PPD_part <- posterior_predict(fit, re.form = NA) # what the model predicts without (1|identr)
var_PPD_part <- apply(PPD_part, 1, var) # the variance over the observations
ratio <- 1 - var_PPD_part / var_PPD_full # the contribution of (1|identr)
summary(ratio) # or hist(ratio)

#8

Thanks a lot for your answer. I am sorry for the confusion. My analyses consist of two parts. First, calculating how much variability a term adds to the model (the ICC in the empty model, just with my random intercepts) I have these results already. Then for the second part, I want to quantify the variance explained by my predictors in each level (I think this is the R2 within my subgroups).

My dataset is actually structured in 3 levels: participants (identr), meal type (fco_code), and data level.

Is the following the correct use of the R2 by subgroups measure?

#R2 level 3 (identr)
post_identr <- posterior_predict(fit,re.form = ~ (1|identr))
postlin_identr <- posterior_linpred(fit,transform=TRUE,re.form = ~ (1|identr))
r2_identr <- mean(apply(postlin_identr,1,FUN=var))/ mean(apply(post_identr,1,FUN=var))

#R2 level 2 (fco_code)
post_fco <- posterior_predict(fit,re.form= ~ (1|fco_code:identr))
postlin_fco <- posterior_linpred(fit,transform= TRUE, re.form = ~ (1|fco_code:identr))
r2_fco <- mean(apply(postlin_fco,1,FUN=var))/mean(apply(post_fco,1,FUN=var))

#R2 level 1 (data level)
postlin_data <- posterior_linpred(fit,transform=TRUE, re.form=NA)
post_data <- posterior_predict(fit, re.form=NA)
r2_data <- mean(apply(postlin_data,1,FUN=var))/ mean(apply(post_data,1,FUN=var))

For example in level 3, does the result give only the variance explained by the predictors in my fit model within this subgroup (or does this include also the observation level)?

And in the case of level 2, this R2 means the variance explained by the predictors only on the fco_code level? (note that these are random intercepts within level 3).

My doubts arise because I am getting unexpected results: among the predictors in the model are sex, which I would expect to explain an important part of the variance in the participant (identr) level, however, the value comes out to be around 0.08.

Thank you!!