I’m fitting complex categorical models with 40+ population-level effects and 2 group-level effects. I use PSIS-LOO+ for model comparisons. However, when I want to estimate the overall goodness-of-fit of a given model, the elpd_loo statistic computed by PSIS-LOO+ is not that helpful because it is not on an easily interpretable scale. One diagnostic that is on an easily interpretable [0,1] scale is the Adjusted McFadden’s R-Squared. In a frequentist setting, it is defined as
1-\frac{LogLik_{M1} - k}{LogLik_{M0}}
where k is the difference in number of parameters between the working model and the null model. In my case, this number is around 130. However, Gelman et al (2015: 172) state that we cannot use k with a Bayesian non-linear model with non-flat priors. The question then becomes: how to determine the Bayesian equivalent of k? Should I just use p_loo or p_waic? Incomprehensibly, those statistics are both almost twice as large as frequentist k even though I’d expect them to be smaller. Is it because they’ve been doubled to match the deviance scale? Should I simply take p_loo or p_waic and divide it by 2 to get a Bayesian k? If so, which one?
References
Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2015. Bayesian Data Analysis . Third edition. Boca Raton: CRC Press.
I get an error message saying “‘loo_R2’ is not defined for unordered categorical models.” Hence I am back to wondering how to calculate k for 1 - \frac{lpd_{M1}-k}{lpd_{M0}}
That appendix shows the equations and code for LOO-R2, so you could adapt the code for your model.
You could also try replacing lpd’s with elpd_loo’s and then you don’t need the adjustment, but for me it’s not clear how well this R2 or adjusted McFadden’s R2 works for unordered catecorigal model, and thus its interpretation can be as difficult as interpreting elpd_loo directly (having something on [0,1] scale doesn’t guarantee that it’s interpretable)
Getting the code to work is not altogether simple because that code seems to be designed for models that were fit with something other than brms (which is what I’m using). The present error message is trying to get slot "sim" from an object of a basic class ("NULL") with no slots. In any case, even if I do manage to get the code to work, I’m not sure what kind of values of the resulting LOO-R2 are good or bad.
In fact, the specific appeal of the McFadden statistic is that such guidelines do exist – McFadden says values over 0.2 represent “an excellent fit” for unordered categorical models (McFadden 1978: 307). That’s a convenient benchmark.
Your suggestion to substitute elpd_loo in both the numerator and the denominator sounds excellent, and the resulting statistics seem favorable yet realistic (around 0.3). If I do end up using this improvised Bayesian “McFadden” statistic, may I cite Vehtari (p.c) for the idea? That would improve its credibility.
References
McFadden, Daniel. 1978. Quantitative methods for analysing travel behavior of individuals: some recent developments. In Hensher, David and Peter Stopher (ed.). Behavioural Travel Modelling . London: Croom Helm.
If you think McFadden statistic is useful for you, then I can guarantee that elpd_loo is sensible replacement for (lpd - k) so that it automatically takes into account the effective complexity of M1 and M0. You can cite me. If you make any experiments I’m happy if you can share results and eventually I can include this in CV-FAQ, too.