bayes_R2 and conditioning on random effects

I have a question regarding bayes_R2(). From the docs, I understand that re.form can be used to condition or refrain from conditioning on random effects in mixed models:

The default, NULL, indicates that all estimated group-level parameters are conditioned on.

However, I don’t quite understand the output. Similar to a frequentist framework, I would expect the conditional R2, i.e. the R2 for the model taking random effects into account, to be higher than the “marginal” R2 that only considers fixed effects. In other word, I would expect the point estimate for bayes_R2(m, re.form = NULL) to be higher than for bayes_R2(m, re.form = NA).

library(rstanarm)
m <- stan_lmer(Sepal.Length ~ Petal.Length + (1 | Species), data = iris, refresh = 0, seed = 333)

# The default, NULL, indicates that all estimated 
# group-level parameters are conditioned on.
mean(bayes_R2(m, re.form = NULL))
#> [1] 0.8278072

# "marginal" R2?
mean(bayes_R2(m, re.form = NA))
#> [1] 0.9515886

Is there anything I am missing?

2 Likes

(bumping)

I know about R2, but I’m clueless what re.form does. Can you make scatter plots of Sepal.Length vs. the predictions with “condition or refrain from conditioning on random effects”? That would help to see the the different behaviors.

1 Like

Hey! Here’s the output :)

library(rstanarm)

model <- rstanarm::stan_lmer(Sepal.Length ~ Petal.Length + (1|Species), data=iris, refresh=0)


# The default (includes random factor)
ypred <- rstanarm::posterior_predict(model, re.form = NULL)
ypred <- sapply(as.data.frame(ypred), median)
plot(x=iris$Sepal.Length, y=ypred)

# Do not take into account the Species
ypred <- rstanarm::posterior_predict(model, re.form = NA)
ypred <- sapply(as.data.frame(ypred), median)
plot(x=iris$Sepal.Length, y=ypred, col="red")

Created on 2021-01-31 by the reprex package (v0.3.0)

I still don’t understand what re.form does, but the predictions in the second plot are surprising as group of predictions go below 4, while the target value in the data doesn’t go below 4.4 (and then ypred has unexpected high variance). Can you explain that?

I’m not exactly sure of what it is supposed to do, but at least it seems to give similar results to lme4:

library(lme4)

model <- lmer(Sepal.Length ~ Petal.Length + (1|Species), data=iris)

ypred <- predict(model, re.form = NULL)
plot(x=iris$Sepal.Length, y=ypred)

ypred <- predict(model, re.form = NA)
plot(x=iris$Sepal.Length, y=ypred, col="red")

Created on 2021-02-01 by the reprex package (v0.3.0)

The original arg explanation says that:

re.form specify which random effects to condition on when predicting. If NULL , include all random effects; if NA or ~0 , include no random effects.

If I’m not wrong, the predictions are mostly driven by the population effects than the individual levels (see this thread).

In terms of the R2, because a mixed model has different levels of variance, I would expect a higher R2 when conditioning on random effects - that R2 takes all sources of uncertainty into account and thus has a “better explained variance”. bayes_R2(m, re.form = NA) should only explain the variance from the fixed effects, and hence should be lower. This is what you also see from functions that compute the R2 for mixed models:

library(lme4)
#> Loading required package: Matrix
m <- lmer(Sepal.Length ~ Petal.Length + (1 | Species), data = iris)
performance::r2(m)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.969
#>      Marginal R2: 0.658
1 Like

Maybe @paul.buerkner who worked with mixed models might have some idea?

re.form = NA sets all random effects to 0, that is, is still a “conditional” R2 but not condition on the estimates random effects but conditioned on an “average” random effects level. This is not, in general a marginal formulation because marginal would mean integrated over the random effects distritbution. Only in linear models, that conditional-on-zero and marginal formulations fall together.

library(rstanarm)
m <- stan_lmer(Sepal.Length ~ Petal.Length + (1 | Species), data = iris, refresh = 0, seed = 333)

mean(bayes_R2(m, re.form = NA))
#> [1] 0.9515886

Thanks Paul! If I understand you correctly, the above R2 is conditioned on an average random effect level. But how should we interpret it, in comparison with mean(bayes_R2(m, re.form = NULL)) = 0.82?

In the frequentist marginal/conditional R2 framework, the two R2s can be interpreted as the explanatory power of the full model and that of only the fixed effects, from which one can derive the part of the random effects. What is the correct interpretation for the two Bayesian R2s?