R^2 calculation for brm model with cumulative family type

I am conducting a power calculation for a brm model that has an ordered factor as the outcome variable and therefore uses family ‘cumulative’. I am using the WebPower function wp.regression (https://www.rdocumentation.org/packages/WebPower/versions/0.5.2/topics/wp.regression) which has been very helpful so far, except for this particular model because I need to compute the model’s R^2 using bayes_R2() and this function does not accept model input with cumulative family/ordinal outcome variable. If anyone has experience computing bayesian R^2 for this type of data I would appreciate any advice.

The bayes_R2 function in brms should work with ordinal regressions; e.g., this works for me:

fit <- brm(rating ~ period + carry,
           data = inhaler, family = cumulative("logit"),
           prior = set_prior("normal(0,5)"), chains = 2)
summary(fit)
bayes_R2(fit)

Note that R2s for logistic (Bernoulli) and cumulative (ordinal) regressions are typically lower than you would expect to see in a standard linear regression. For such models, you might want to consider a Bayesian version of the McKelvey-Zavoina R2, which provides an approximation of the R2 that would have been obtained if a linear model had have been run on observations of the continuous latent variable underlying the discrete responses (Veall and Zimmermann, 1992; Hagle and Mitchell, 1992; Veall and Zimmermann, 1994).

If of interest, I can provide a function to calculate a Bayesian version of the McKelvey-Zavoina R2 from a fitted brms model.

2 Likes

Hi @andymilne thank you for your advice! Yes I would appreciate the function for the Bayesian version of the McKelvey-Zavoina R2, if possible. Much appreciated.

Here it is. Apologies for probably clumsy R code but it should do the trick …

# Bayesian McKelvey-Zavoina R2 ------------------------------------------------
# Bayesian version of McKelvey and Zavoina's pseudo-R2 for binary and ordinal
# brms models (McKelvey and Zavoina, 1975). See also Gelman et al. (2018). This
# pseudo-R2 closely approximates the R2 that would have been obtained if a
# linear model had have been run on observations of the continuous latent
# variable underlying the discrete responses (Veall and Zimmermann, 1992; Hagle
# and Mitchell, 1992; Veall and Zimmermann, 1994).
Bayes_R2_MZ <- function(fit, ...) {
  y_pred <- fitted(fit, scale = "linear", summary = FALSE, ...)
  var_fit <- apply(y_pred, 1, var)
  if (fit$formula$family$family == "cumulative" ||
      fit$formula$family$family == "bernoulli") {
    if (fit$formula$family$link == "probit" || 
        fit$formula$family$link == "probit_approx") {
      var_res <- 1
    }
    else if (fit$formula$family$link == "logit") {
      var_res <- pi^2 / 3 
    }
  } 
  else {
    sum_fit <- summary(fit)
    sig_res <- sum_fit$spec_pars["sigma", "Estimate"]
    var_res <- sig_res^2
  } 
  R2_MZ <- var_fit / (var_fit + var_res)
  print(
    data.frame(
      Estimate = mean(R2_MZ), 
      Est.Error = sd(R2_MZ), 
      "l-95% CI" = quantile(R2_MZ, 0.025),
      "u-95% CI" = quantile(R2_MZ, 0.975),
      row.names = "Bayes_R2_MZ", 
      check.names = FALSE), 
    digits = 2)
}
2 Likes

This is very useful thanks so much