LOO-IC and ELPD_loo weights

Hi all,

I am running a stacking analysis with 8 models in the stack. Individually, the LOO values are fine (with k_threshold=.7). Here are the weights.

# Method: stacking
## ------
##    weight
## m1 0.000 
## m2 0.038 
## m3 0.247 
## m4 0.000 
## m5 0.487 
## m6 0.000 
## m7 0.000 
## m8 0.227

However, when I compare the ELPD_loo values, I get the following

compOverall <- loo_compare(loo_list)
print(compOverall, simplify = FALSE, digits = 3)
##    elpd_diff se_diff elpd_loo se_elpd_loo p_loo   se_p_loo looic   se_looic
## m5   0.000     0.000 -63.071    6.054       8.008   2.054  126.143  12.109 
## m3  -2.093     4.184 -65.165    6.423       7.846   2.422  130.329  12.845 
## m2  -2.796     3.987 -65.868    8.395       7.409   2.358  131.735  16.789 
## m8  -5.499     6.142 -68.571    9.060      10.968   4.635  137.141  18.120 
## m1  -5.584     4.342 -68.655    7.775       5.552   2.012  137.311  15.551 
## m4  -8.640     5.666 -71.711    9.478      11.585   4.119  143.422  18.956 
## m7  -8.658     4.267 -71.730    7.975       8.104   2.562  143.459  15.949 
## m6  -8.711     3.921 -71.783    5.500      11.408   2.553  143.566  10.999

Notice that Model 8 has a ELPD_loo weight of 0.227, but it is ranked 4th in the comparison where I would think it should be ranked 3rd as it has a weight value much larger than Model 2. I understand that the weight is the minimization of an argument but it seems odd that the ranking of models based on stacking weights would not lead me to the same ranking based on LOOIC. Thoughts are appreciated.

Thank you,

´loo_compareorders the models based onelpd_diff` only. The stacking weights can be quite different than this ordering. See the supplement of Bayesian Hierarchical Stacking: Some Models Are (Somewhere) Useful for an example.

Based on elpd_diff and se_diff models 1,2,3,5,8 have very similar predictive performances, but the predictive distributions are such that combination of 2,3,5,8 is expected to have a better predictive performance, and model 1 is likely to be very similar with one of 2,3,5,8 and thus can have weight 0.

Thanks, Aki. That’s really helpful.

1 Like