Weighted Combination of PPDs has worse fit than indiviual PPD

I am attempting to use stacking weights (from loo::stacking_weights) to combine the PPD’s from a variety of multilevel models fit in Rstan. However, when I examine the weighted PPD the fit with the observed data is worse than those with any one specific model. Specifically the weigthed combinaiton is much more peaked (more values around mean) and underestimates the standard deviation (all of the individual models did a decent job capturing standard deviation).

  1. Can someone confirm that I my approach to combining the PPD’s based on stacking_weights is correct (code below). My approach is to use posteior_predict() to get PPD’s for each model, then multiply the resulting matrices by the weigths to combine them into one ppd.

  2. Are there any theoretical reasons why a weighted combination of PPD’s would be more centered around the mean and underestimate the standard deviation? Note that there is some weak spatial dependence in my data. I have also tried creating stacking weights from kfold() cv scores with similar results. I have also attempted to do a partitioned (by spatial units) kfold, however I run into the error when Stan tries to predict new factor levels that are in the test set but not the training set (as discussed here: LOO error when k_threshold = 0.7 fits model that drops sole observation of factor level)

loo_list<-list(loo3,loo4,loo5,loo6,loo7,loo8)  #create a list from loo objects
lstack<-loo_model_weights(loo_list,method='stacking') #create stacking weights

#--Create PPDs- (these correspond to the order in the loo_list)
yrep3<-posterior_predict(hfit3_sh,draws=500)
yrep4<-posterior_predict(hfit4_sh,draws=500)
yrep5<-posterior_predict(hfit5_sh,draws=500)
yrep6<-posterior_predict(hfit6_sh,draws=500)
yrep7<-posterior_predict(hfit7_sh,draws=500)
yrep8<-posterior_predict(hfit8_sh,draws=500)

wts<-lstack[,1] #orginally I created a matrix with stacking weights, and psedo BMA weights, as per the vignette.

#Combine to create a new weigthed PPD
yrepAll2<- wts[1] * yrep3 + wts[2] * yrep4 + wts[3] * yrep5+ wts[4] * yrep6 + wts[5] * yrep7+ wts[6] * yrep8

I then compare ‘yrepAll2’ with observed values using ppc_dens_overlay, et cetera.

Any suggestions are appreciated.

No, it’s wrong.

Change this

to

yrep3<-posterior_predict(hfit3_sh,draws=round(wts[1]*500))
yrep4<-posterior_predict(hfit4_sh,draws=round(wts[2]*500))
...

This way you are weighting distributions and not the values.
Then combine

yrep<-c(yrep3,yrep4,...)

And you will have 500 draws from the weighted combination.

Thank you!!! Note that, I based my approach from this code block example in the help for loo.stranreg (loo.stanreg) in rstanarm 2.17.4

https://www.rdocumentation.org/packages/rstanarm/versions/2.17.4/topics/loo.stanreg

# averaging predictions
wts <- loo_model_weights(loo_list)
yrep1 <- posterior_predict(fit1)
yrep2 <- posterior_predict(fit2)
yrep3 <- posterior_predict(fit3)
wt_avg_yrep <- wts[1] * yrep1 + wts[2] * yrep2 + wts[3] * yrep3

Is the above approach still appropriate for generating weighted predictions? Also, are there plans to introduce more examples or convenience functions for working with stacking weights?

Thank you again for the fast response!

Ouch. That was a mistake by @jonah, and I think it should be fixed in 2.18 (which is a bit delayed), but I’ll check that it will be fixed. Thanks for reporting this!

We have vague plans, but with low priority as we don’t use much these weights. I haven’t encountered yet a problem where I could not use continuous model expansion instead, but we added stacking weights to loo package as we saw people using waic weights, which is worse. If in your project you code some useful convenience functions we are happy to see them and consider including them in loo.

Oh, we do have an issue for convenience function to combine https://github.com/stan-dev/loo/issues/83