I am trying to understand the bayes_R2 function to estimate the proportion of deviance explained for new data and I have been through the information from Gelman et al…
My concern was raised when analyzing my own data. I will illustrate my issue, using the model suggested in the vignette from Paul Bürkner on custom response distributions with the study case using the cbpp data (from the lme4 package).
The model that he suggests in a first step is the following:
fit1 <- brm(incidence | trials(size) ~ period + (1|herd), data = cbpp, family = binomial())
To play around with the data, I wanted to perform the same model using bernouilli as the probability distribution instead of the Binomial distribution. I therefore performed a similar model but on a transformed version of the initial dataset in a way that each row is one individual observed:
fit2 <- brm(incidence ~ period + (1|herd), data = cbpp_individual, family = bernouilli())
the 2 models give the same estimates and errors on the parameters, which is obviously what I would have expected. However, the bayes_R2 give completely different results for fit1=0.64 and fit2=0.10.
I cannot figure out what may explains this gap. I would be delighted to be enlightened on this point.
Yes, Bayes_R2 in brms is based on the online paper and should yield the same results.
I agree with Aki that using different observation models make the results of Bayes_R2 incomparable.
You should use the same response vector in both models to make Bayes_R2 comparisons meaningful.
my feeling in that the bayes_R2 is valid for the model with a Bernouilli distribution but not with the Binomial. Indeed, doing the same kind of exploration on my own data, I found for bayesian models, a R2=39% with a binomial distribution, dropping to 7% with a bernouilli distribution. However, if I do the same model in a frequentist framework, I find about the same R2 as with a bernouilli, around 7%.
I am trying to understand the mathematics that may explain that difference. I am thinking loud now, but could not it be linked to the fact that, as you wrote Paul, “A drawback of the binomial model is that – after taking into account the linear predictor – its variance is fixed to Var(yi)=Tipi(1−pi). All variance exceeding this value cannot be not taken into account by the model.”?
One potential problem I see with binomial models and Bayes_R2 is that responses are not on the same scale if trials is not constant across observations. I will have to take a closer look at this again, but for now I would say you should probably indeed prefer the bernoulli version out of the two.
The code in the online appendix does not support Binomial in all cases. It’s better to copy the code from @jonah’s more careful implementation in rstantools (I think Jonah did commit that in github in December).
The default method in rstantools has no special handling for binomial models, but the method in rstanarm for stanreg objects has. However, this special handling is done in brms anyway (which is not visible in bayes_R2.brmsfit as it happens on a deeper level), so that brms and rstanarm yield the same results of bayes_R2 for binomial models.
We’ll finally have the updated version in the next rstanarm release coming soon. It’s been implemented on GitHub for a while. The implementation (including for binomial models) is here:
and follows the method in the published version of our paper (not my personal GitHub repo):
While looking at the stanreg code I noticed that there might be an inefficient pattern in it in the form of mu_pred <- mu_pred %*% diag(trials). If we have lots of observations, diag(trials) will be huge and the matrix multiplication will be extremely slow. Here is a quick and dirty benchmark:
library(Matrix)
library(pryr)
N <- 10000
S <- 1000
set.seed(1234)
trials <- rpois(N, 1)
object_size(trials)
trials_diag <- diag(trials)
object_size(trials_diag)
trials_mat <- matrix(trials, nrow = S, ncol = N, byrow = TRUE)
object_size(trials_mat)
eta <- matrix(rnorm(S * N), nrow = S, ncol = N)
object_size(eta)
# current approach in bayes_R2.stanreg
system.time(
res1 <- eta %*% trials_diag
)
object_size(res1)
# used by brms
system.time(
res2 <- eta * trials_mat
)
object_size(res2)
# using the Matrix package
system.time(
res3 <- eta %*% Diagonal(x = trials)
)
object_size(res3)
# compare results
all.equal(res1, res2)
all.equal(res1, as.matrix(res3), check.attributes = FALSE)