Elpd and weight give very different conclusion

mgirondot · December 7, 2025, 1:51pm

I fit a non-linear custom model (asymmetric logit) with random effects (R and C) and I want to compare the different fitted models. It works well but I have difficulties to understand the output of loo_compare and loo_model_weights:
First I do:

llok <- loo_compare(list(noR_noC=m1_loo, 
                         R_noC=m2_loo, 
                         noR_C=m3_loo, 
                         R_C=m4_loo))

I get:

           elpd_diff  se_diff
R_C             0.0       0.0 
noR_C          -0.9       1.7 
R_noC        -132.5      50.1 
noR_noC      -168.9      58.6

Now:

loo_model_weights(list(noR_noC=m1_loo, 
                                     R_noC=m2_loo, 
                                     noR_C=m3_loo, 
                                     R_C=m4_loo))

Method: stacking
------
                 weight
noR_noC           0.079 
R_noC             0.149 
noR_C             0.000 
R_C               0.772

How it is possible that
R_C and noR_C being nearly non-distinguable using elpd_diff are so different based on weight ?

Thanks a lot

avehtari · December 7, 2025, 3:42pm

I edited your post to include triple ticks to make the code and output easier to read.

Because they are two different methods. By default loo_model_weights() uses stacking, which is designed for making a mixture of predictive distributions that has good predictive performance and it is not meant for model comparison (see more in Using Stacking to Average Bayesian Predictive Distributions (with Discussion)). It seems that R_C and noR_C have very similar predictive distributions, and thus one of them can have weight 0. Also it seems that although R_noC and noR_noC have worse predictive performance alone, they have such predictive distributions that including them in the mixture predictive distribution with a small weight, improves the estimated predictive performance. See more about how weight is likely to be 0 for one model if the models have very similar predictive distributions in the appendix of Bayesian Hierarchical Stacking: Some Models Are (Somewhere) Useful.

mgirondot · December 7, 2025, 6:02pm

It is very clear! Thanks a lot.

Topic		Replies	Views
LOO-IC and ELPD_loo weights Modeling loo	2	355	January 31, 2024
Loo_compare vs averaging/weighting via stacking or pseudo-BMA weighting rstanarm fitting-issues , loo	2	932	February 5, 2021
Interpretation of relative weight in model comparison General loo , cmdstanpy	1	786	March 20, 2021
Loo_model_weights() returning inconsistent result General rstan , loo , r , brms	4	606	May 20, 2021
LOO Visualization / Stacking Modeling bayesplot , loo	4	834	March 24, 2020

Elpd and weight give very different conclusion

Related topics