How to interpret bayes_R2 in a brms meta-analysis?

Hi Stan/brms folk,

I want to know if adding moderators to my meta-analysis improves the model. As I understand it, there are two sources of variability in a hierarchical model of the observed effects \hat \theta_k, namely sampling error \sigma^2_k and true effect heterogeneity \tau^2 such that:

\hat \theta_k \sim N(\mu, \ \sigma^2_k + \tau^2)

If I correctly understand the summary output of the brms meta-analysis below, I get an estimate for \tau but not \sigma_k.

fit <- brm(
  formula = d_post | se(se_post) ~ 1 + (1|study),
  data = df,
  prior = 
    prior(inv_gamma(0.5, 1), class = sd) +
    prior(normal(0, 3), class = b)
  # file = "results/stress_ma.rds"
Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: d_post | se(se_post) ~ 1 + (1 | study) 
   Data: stress (Number of observations: 65) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Group-Level Effects: 
~study (Number of levels: 65) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.48      0.07     0.36     0.63 1.00     1850     2412

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -0.14      0.10    -0.35     0.05 1.00     1316     1467

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.00      0.00     0.00     0.00   NA       NA       NA

The estimate for tau is sd(Intercept), and the estimate for \sigma_k would normally be sigma but this is zero because there is no within-study variation to model (is this right… even though se_post is provided and identified in the formula syntax…?).

If I understand all that correctly, then my problem is when I add a covariate to the model (a “meta-regression”). Sometimes when I do that, the estimate for tau increases instead of decreases, indicating there is more residual between-study variability after adding the covariate. I think can understand how this might happen, but I have also noticed this happens even when the output of bayes_R2() substantially increases! (e.g., from about r2 = 0.2 to r2 = 0.5). In OLS R^2 is the explained variance and I’m dimly aware that this is not the case for bayes_R2(). But how is residual between-study variability increasing and also the bayes_R2 increasing after adding a covariate? Given there is no estimate for \sigma_k^2 in these models, I would expect \tau^2 to be the only source of variance and so R^2 values should reflect the increase in residual variance when I add a covariate.

tldr; After adding a covariate, why does my estimate of tau increase (so more residual between-study variability) while at the same time R2 increases (so more variability is explained…?) (and sigma remains at zero)? Is my covariate explaining one source of variability while increasing another?

Looking forward to filling my gaps in understanding here!



I think my problem has something to do with this: bayes_R2 and conditioning on random effects - #16 by avehtari

brms::bayes_R2.brmsfit() estimates R2 as:

Var(pred) / (Var(pred) + Var(pred - y))

which is the the “residual based R^2” in Bayesian R2 and LOO-R2 and R-squared for Bayesian regression models according to @avehtari

But I’m thinking about it in terms of the “model based R^2”:

Var(pred) / (Var(pred) + sigma^2)

When sigma = 0, adding a covariate will not change this proportion.

However even this is an incomplete understanding, because Var(pred) can include predictions based on all fixed + random effects (i.e., re.form = NULL), OR predictions based only on fixed effects (re.form = NA).

Adding a covariate will change any predictions based on fixed effects while changes to tau will only change predictions based on random effects.

All this is to say that adding a covariate in a meta-regression can increase the residual between-subject variability (tau^2) and at the same time improve the predictions based on the fixed effects (i.e., bayes_R2(re.form=NA), because these are two seperate sources of variability.

I think sigma is 0 because that’s the default for se(). Use d_post | se(se_post, sigma = TRUE) to model the residual standard deviation (if that’s indeed what you want). See ?se and read through the “Additional response information” section of ?brmsformula.

1 Like

Thanks. If I add sigma = TRUE then Sigma and sd(Intercept) are equal in the model output. I think that makes sense to me since there is no other source of variance in the model inputs beyond the between-study variance (\tau^2).