LOO-IC and ELPD_loo weights

David_Kaplan · January 30, 2024, 3:45pm

Hi all,

I am running a stacking analysis with 8 models in the stack. Individually, the LOO values are fine (with k_threshold=.7). Here are the weights.

# Method: stacking
## ------
##    weight
## m1 0.000 
## m2 0.038 
## m3 0.247 
## m4 0.000 
## m5 0.487 
## m6 0.000 
## m7 0.000 
## m8 0.227

However, when I compare the ELPD_loo values, I get the following

compOverall <- loo_compare(loo_list)
print(compOverall, simplify = FALSE, digits = 3)
##    elpd_diff se_diff elpd_loo se_elpd_loo p_loo   se_p_loo looic   se_looic
## m5   0.000     0.000 -63.071    6.054       8.008   2.054  126.143  12.109 
## m3  -2.093     4.184 -65.165    6.423       7.846   2.422  130.329  12.845 
## m2  -2.796     3.987 -65.868    8.395       7.409   2.358  131.735  16.789 
## m8  -5.499     6.142 -68.571    9.060      10.968   4.635  137.141  18.120 
## m1  -5.584     4.342 -68.655    7.775       5.552   2.012  137.311  15.551 
## m4  -8.640     5.666 -71.711    9.478      11.585   4.119  143.422  18.956 
## m7  -8.658     4.267 -71.730    7.975       8.104   2.562  143.459  15.949 
## m6  -8.711     3.921 -71.783    5.500      11.408   2.553  143.566  10.999

Notice that Model 8 has a ELPD_loo weight of 0.227, but it is ranked 4th in the comparison where I would think it should be ranked 3rd as it has a weight value much larger than Model 2. I understand that the weight is the minimization of an argument but it seems odd that the ranking of models based on stacking weights would not lead me to the same ranking based on LOOIC. Thoughts are appreciated.

Thank you,

avehtari · January 31, 2024, 1:21pm

´loo_compareorders the models based onelpd_diff` only. The stacking weights can be quite different than this ordering. See the supplement of Bayesian Hierarchical Stacking: Some Models Are (Somewhere) Useful for an example.

Based on elpd_diff and se_diff models 1,2,3,5,8 have very similar predictive performances, but the predictive distributions are such that combination of 2,3,5,8 is expected to have a better predictive performance, and model 1 is likely to be very similar with one of 2,3,5,8 and thus can have weight 0.

David_Kaplan · January 31, 2024, 2:32pm

Thanks, Aki. That’s really helpful.

Topic		Replies	Views
Loo_compare vs averaging/weighting via stacking or pseudo-BMA weighting rstanarm fitting-issues , loo	2	892	February 5, 2021
Model selection with loo_compare and loo_model_weights Modeling rstan , techniques , loo	18	1208	July 25, 2023
Model stacking and LOO (brms models) brms loo	11	7160	June 29, 2018
Statistical significance of model comparison using ELPD Modeling loo	2	268	October 3, 2024
Weighted Combination of PPDs has worse fit than indiviual PPD Modeling loo	4	702	September 17, 2018

LOO-IC and ELPD_loo weights

Related topics