I am conducting a power calculation for a brm model that has an ordered factor as the outcome variable and therefore uses family ‘cumulative’. I am using the WebPower function wp.regression (https://www.rdocumentation.org/packages/WebPower/versions/0.5.2/topics/wp.regression) which has been very helpful so far, except for this particular model because I need to compute the model’s R^2 using bayes_R2() and this function does not accept model input with cumulative family/ordinal outcome variable. If anyone has experience computing bayesian R^2 for this type of data I would appreciate any advice.
The bayes_R2
function in brms should work with ordinal regressions; e.g., this works for me:
fit <- brm(rating ~ period + carry,
data = inhaler, family = cumulative("logit"),
prior = set_prior("normal(0,5)"), chains = 2)
summary(fit)
bayes_R2(fit)
Note that R2s for logistic (Bernoulli) and cumulative (ordinal) regressions are typically lower than you would expect to see in a standard linear regression. For such models, you might want to consider a Bayesian version of the McKelvey-Zavoina R2, which provides an approximation of the R2 that would have been obtained if a linear model had have been run on observations of the continuous latent variable underlying the discrete responses (Veall and Zimmermann, 1992; Hagle and Mitchell, 1992; Veall and Zimmermann, 1994).
If of interest, I can provide a function to calculate a Bayesian version of the McKelvey-Zavoina R2 from a fitted brms model.
Hi @andymilne thank you for your advice! Yes I would appreciate the function for the Bayesian version of the McKelvey-Zavoina R2, if possible. Much appreciated.
Here it is. Apologies for probably clumsy R code but it should do the trick …
# Bayesian McKelvey-Zavoina R2 ------------------------------------------------
# Bayesian version of McKelvey and Zavoina's pseudo-R2 for binary and ordinal
# brms models (McKelvey and Zavoina, 1975). See also Gelman et al. (2018). This
# pseudo-R2 closely approximates the R2 that would have been obtained if a
# linear model had have been run on observations of the continuous latent
# variable underlying the discrete responses (Veall and Zimmermann, 1992; Hagle
# and Mitchell, 1992; Veall and Zimmermann, 1994).
Bayes_R2_MZ <- function(fit, ...) {
y_pred <- fitted(fit, scale = "linear", summary = FALSE, ...)
var_fit <- apply(y_pred, 1, var)
if (fit$formula$family$family == "cumulative" ||
fit$formula$family$family == "bernoulli") {
if (fit$formula$family$link == "probit" ||
fit$formula$family$link == "probit_approx") {
var_res <- 1
}
else if (fit$formula$family$link == "logit") {
var_res <- pi^2 / 3
}
}
else {
sum_fit <- summary(fit)
sig_res <- sum_fit$spec_pars["sigma", "Estimate"]
var_res <- sig_res^2
}
R2_MZ <- var_fit / (var_fit + var_res)
print(
data.frame(
Estimate = mean(R2_MZ),
Est.Error = sd(R2_MZ),
"l-95% CI" = quantile(R2_MZ, 0.025),
"u-95% CI" = quantile(R2_MZ, 0.975),
row.names = "Bayes_R2_MZ",
check.names = FALSE),
digits = 2)
}
This is very useful thanks so much
Thanks for sharing this super helpful function, @andymilne! Do you think there would be an easy way to modify this to compute marginal R2 (related to the fixed effects of the model only)?
As far as I understand, if you have random effects in the model then the Bayesian McKelvey-Zavoina R2 calculated by this method is the R2 of the fixed effects conditioned on an average “participant” (or the average of whatever the grouping factor(s) are). The method ignores the random effects in the model, so I don’t think it corresponds to either a “marginal” or “conditional” R2 as these terms are sometimes applied in recent literature about R2s.
However, if you were to run the “same” model but without any random effects, the Bayesian McKelvey-Zavoina R2 on that model would be (I think, probably depends on the precise definition of “marginal”) the marginal R2. So that is the direct answer to your question.
I am not sure if an adapted version of the Bayesian McKelvey-Zavoina R2 can calculate the R2 resulting from both the fixed and random effects (which I think would correspond to what is sometimes called, confusingly, the conditional R2), although there may well be some way to do that.
That makes sense, thanks for your help!
Actually, if it’s helpful for anyone, I think your function does return “conditional R2” (variance explained by the complete model). In
y_pred <- fitted(fit, scale = "linear", summary = FALSE, ...)
,
re_formula
here is set to NULL
by default, according to the brms::fitted()
documentation. As a result, all random (i.e., group-level) effects are considered in the prediction. To obtain “marginal R2,” I think one would only need to change this argument to NA
to exclude random effects in the prediction.
Does this seem about right?
I really appreciate that this function approximates R2 as if we were modeling a linear variable underlying discrete responses.