I am attempting to use stacking weights (from loo::stacking_weights) to combine the PPD’s from a variety of multilevel models fit in Rstan. However, when I examine the weighted PPD the fit with the observed data is worse than those with any one specific model. Specifically the weigthed combinaiton is much more peaked (more values around mean) and underestimates the standard deviation (all of the individual models did a decent job capturing standard deviation).

Can someone confirm that I my approach to combining the PPD’s based on stacking_weights is correct (code below). My approach is to use posteior_predict() to get PPD’s for each model, then multiply the resulting matrices by the weigths to combine them into one ppd.

Are there any theoretical reasons why a weighted combination of PPD’s would be more centered around the mean and underestimate the standard deviation? Note that there is some weak spatial dependence in my data. I have also tried creating stacking weights from kfold() cv scores with similar results. I have also attempted to do a partitioned (by spatial units) kfold, however I run into the error when Stan tries to predict new factor levels that are in the test set but not the training set (as discussed here: LOO error when k_threshold = 0.7 fits model that drops sole observation of factor level)
loo_list<list(loo3,loo4,loo5,loo6,loo7,loo8) #create a list from loo objects
lstack<loo_model_weights(loo_list,method='stacking') #create stacking weights
#Create PPDs (these correspond to the order in the loo_list)
yrep3<posterior_predict(hfit3_sh,draws=500)
yrep4<posterior_predict(hfit4_sh,draws=500)
yrep5<posterior_predict(hfit5_sh,draws=500)
yrep6<posterior_predict(hfit6_sh,draws=500)
yrep7<posterior_predict(hfit7_sh,draws=500)
yrep8<posterior_predict(hfit8_sh,draws=500)
wts<lstack[,1] #orginally I created a matrix with stacking weights, and psedo BMA weights, as per the vignette.
#Combine to create a new weigthed PPD
yrepAll2< wts[1] * yrep3 + wts[2] * yrep4 + wts[3] * yrep5+ wts[4] * yrep6 + wts[5] * yrep7+ wts[6] * yrep8
I then compare ‘yrepAll2’ with observed values using ppc_dens_overlay, et cetera.
Any suggestions are appreciated.