R2 for Bayesian models v. Nakagawa delta method

I am looking to sensibly partition the variance from a mixed-effects logit model fit via stan. I understand typical Bayesian R^2 (Bayesian R2 and LOO-R2) is:

R^2 = \frac{\sigma^2_{\mu}}{\sigma^2_{\mu} + \sigma^2_{\epsilon}}.

I am however interested in obtaining estimates of the share of variance explained by each model component - so suppose a set of fixed effects with variance \sigma^2_f and a random effect with variance \sigma^2_a, then I’d like to calculate:

R_{f}^2 = \frac{\sigma^2_{f}}{\sigma^2_{f} + \sigma^2_{a} + \sigma^2_{\epsilon}}, \mbox{ and } R_{a}^2 = \frac{\sigma^2_{a}}{\sigma^2_{f} + \sigma^2_{a} + \sigma^2_{\epsilon}}.

Classic Bayesian approaches for logistic regression calculate \sigma^2_{\epsilon} at the predicted-probability level, as opposed to the latent logit level. I try to stick to this and calculate the component-specific variance with an approach which uses the predicted probabilities at each level (similarly to what is suggested in the old Gelman (2006) paper in the context of linear models). For every posterior simulation, I calculate:

\widehat{\sigma^2}_a = Var \mbox{ ilogit}( \theta_a),

But this has the very undesirable property to produce R^2 s that do not sum to 1 ( and at times are larger than 1 on their own), regardless of which method for calculating the residual variance is chosen.

The Nakagawa et al. (2017) paper offers an interesting solution using the corrected delta method to estimate the residual variance at the latent (logit) level. Based on other posts in this forum (e.g. here) I have seen people have a familiarity with the paper. However it is formulated from a frequentist prospective, and i’m having trouble applying this to a Bayesian model.

The formula they derive for the latent-scale variance is:

\sigma^2_\epsilon \approx \frac{1}{\hat{p}(1-\hat{p})};\\ \hat{p} \approx \mbox{ilogit}(\beta_0) + 0.5\sigma^2_\tau \frac{exp(\beta_0)(1-exp(\beta_0)}{(1+exp(\beta_0))^3} ;

where \beta_0 is the model intercept; \sigma^2_\tau is the sum of all random-effect variance components in the model, and \hat{p} is the best guess for the expected value of our bernoulli-distributed outcome variable y.

If I calculate the R^2 replacing \sigma^2_\tau with the variance of my linear predictor Var{(\mu)} , I get the following:

Which would seem to suggest this is a sensible approach which both gives R2 which is consistent with other R2 measures, and allows me to partition the variance as I want to.

I have the following questions which I hope folks on the forum may help with:

  1. The Nakagawa formula is very insistent on this distinction between random and fixed effects - namely \sigma^2_\tau is meant to be the sum of the variances of random effect components (not including the fixed-effects variance) - but from a Bayesian prospective I do not think there is any actual difference – everything is a ‘random’ effect (or better a ‘modeled’ effect) – so I thought it would be sensible to also include the variance of the fixed effect components in \sigma^2_\tau – hence my choice to replace it with Var{(\mu)} .
    Is this a sensible approach in your estimation ?

  2. Relatedly, the paper suggests it is possible to replace \beta_0 with the fixed effects – namely \beta_0 + \sum^K_{k=1}\beta_k x_{i,k}. Here I do not understand why I could not / should not include the random effects in this predictor — again I’m not sure about how the conceptual difference between frequentist and bayesian random effects interacts with the Nakagawa formula;

  3. Is it somehow possible to obtain the variance partitioning I want - namely a set of R^2 for each model component of interest which also sum to 1 - within the loo framework ?

Thanks in advance for your help and time.

I don’t have answers to your specific questions, but if you haven’t seen it, this paper proposes a full framework for variance partitioning in a Bayesian context that you may find useful, and which addresses corner cases found in a lot of more complex models.

Doors and corners of variance partitioning in statistical ecology
Torsti Schulz, Marjo Saastamoinen, Jarno Vanhatalo
bioRxiv 2021.10.17.464682; doi: Doors and corners of variance partitioning in statistical ecology | bioRxiv


Thanks for sharing that paper - this is very interesting and relevant - especially the methods to account for covariance amongst linear terms. On the terms of this paper, my example above would be a strictly ‘diagonal variance partition’.

Still, if I understand the paper correctly, the authors do not deal with a full variance-partitioning exercise, but rather stick to partitioning the linear predictor. This is cool and very useful, but we still cant calculate R^2 if the residuals are non-lineariseable.

The authors say:

The approach is not limited to partitioning the variation in the linear predictor, but can also be used for the more traditional partitioning of explained variation, R^2, when working with a linear model or a model with linearizable residuals.

I guess one could combine the Nakagawa’s formula to linearise the variance of a Bernoulli-distributed variable, and the linear-predictor partitioning proposed in this paper to obtain a fully partitionable measure of R^2, but my questions about how exactly to do that in a Bayesian context remain, unfortunately, unanswered.