Weighted Combination of PPDs has worse fit than indiviual PPD

fdiv · September 17, 2018, 6:28pm

I am attempting to use stacking weights (from loo::stacking_weights) to combine the PPD’s from a variety of multilevel models fit in Rstan. However, when I examine the weighted PPD the fit with the observed data is worse than those with any one specific model. Specifically the weigthed combinaiton is much more peaked (more values around mean) and underestimates the standard deviation (all of the individual models did a decent job capturing standard deviation).

Can someone confirm that I my approach to combining the PPD’s based on stacking_weights is correct (code below). My approach is to use posteior_predict() to get PPD’s for each model, then multiply the resulting matrices by the weigths to combine them into one ppd.
Are there any theoretical reasons why a weighted combination of PPD’s would be more centered around the mean and underestimate the standard deviation? Note that there is some weak spatial dependence in my data. I have also tried creating stacking weights from kfold() cv scores with similar results. I have also attempted to do a partitioned (by spatial units) kfold, however I run into the error when Stan tries to predict new factor levels that are in the test set but not the training set (as discussed here: LOO error when k_threshold = 0.7 fits model that drops sole observation of factor level)

loo_list<-list(loo3,loo4,loo5,loo6,loo7,loo8)  #create a list from loo objects
lstack<-loo_model_weights(loo_list,method='stacking') #create stacking weights

#--Create PPDs- (these correspond to the order in the loo_list)
yrep3<-posterior_predict(hfit3_sh,draws=500)
yrep4<-posterior_predict(hfit4_sh,draws=500)
yrep5<-posterior_predict(hfit5_sh,draws=500)
yrep6<-posterior_predict(hfit6_sh,draws=500)
yrep7<-posterior_predict(hfit7_sh,draws=500)
yrep8<-posterior_predict(hfit8_sh,draws=500)

wts<-lstack[,1] #orginally I created a matrix with stacking weights, and psedo BMA weights, as per the vignette.

#Combine to create a new weigthed PPD
yrepAll2<- wts[1] * yrep3 + wts[2] * yrep4 + wts[3] * yrep5+ wts[4] * yrep6 + wts[5] * yrep7+ wts[6] * yrep8

I then compare ‘yrepAll2’ with observed values using ppc_dens_overlay, et cetera.

Any suggestions are appreciated.

avehtari · September 17, 2018, 6:53pm

No, it’s wrong.

Change this

to

yrep3<-posterior_predict(hfit3_sh,draws=round(wts[1]*500))
yrep4<-posterior_predict(hfit4_sh,draws=round(wts[2]*500))
...

This way you are weighting distributions and not the values.
Then combine

yrep<-c(yrep3,yrep4,...)

And you will have 500 draws from the weighted combination.

fdiv · September 17, 2018, 7:50pm

Thank you!!! Note that, I based my approach from this code block example in the help for loo.stranreg (loo.stanreg) in rstanarm 2.17.4

https://www.rdocumentation.org/packages/rstanarm/versions/2.17.4/topics/loo.stanreg

# averaging predictions
wts <- loo_model_weights(loo_list)
yrep1 <- posterior_predict(fit1)
yrep2 <- posterior_predict(fit2)
yrep3 <- posterior_predict(fit3)
wt_avg_yrep <- wts[1] * yrep1 + wts[2] * yrep2 + wts[3] * yrep3

Is the above approach still appropriate for generating weighted predictions? Also, are there plans to introduce more examples or convenience functions for working with stacking weights?

Thank you again for the fast response!

avehtari · September 17, 2018, 8:01pm

Ouch. That was a mistake by @jonah, and I think it should be fixed in 2.18 (which is a bit delayed), but I’ll check that it will be fixed. Thanks for reporting this!

We have vague plans, but with low priority as we don’t use much these weights. I haven’t encountered yet a problem where I could not use continuous model expansion instead, but we added stacking weights to loo package as we saw people using waic weights, which is worse. If in your project you code some useful convenience functions we are happy to see them and consider including them in loo.

avehtari · September 17, 2018, 8:03pm

Oh, we do have an issue for convenience function to combine https://github.com/stan-dev/loo/issues/83

Topic		Replies	Views
Loo_compare vs averaging/weighting via stacking or pseudo-BMA weighting rstanarm fitting-issues , loo	2	887	February 5, 2021
LOO-IC and ELPD_loo weights Modeling loo	2	304	January 31, 2024
Stacked elpd_loo estimation Modeling performance , loo , rstanarm , cross-validation	4	192	April 26, 2024
Using Kullback-Liebler divergence to summarize stacking results with LOO Modeling loo	3	696	October 9, 2020
Model selection with loo_compare and loo_model_weights Modeling rstan , techniques , loo	18	1194	July 25, 2023

Weighted Combination of PPDs has worse fit than indiviual PPD

Related topics