How to describe bayesian stacking weights?


Hi there!

I am writing an article based on model selection by stacking of predictive distribution. I have two difficulties :

  • I am not a statistician, so if I’m far to understand every theoritical subtility behind this method;
  • I write for ecologists, who are now used to model selection. However, the ecological community uses generally pointwise Information Criteria such as AIC (sometimes DIC).

So I try to find the clearest and most convincing way to describe this approach. For the moment, the result is :

Models have been compared by the mean of weights based on the stacking of predictive distribution. This method, related to Bayesian model averaging, estimates model weights by maximizing leave-one-out predictive density of the complete model containing all proposed sub-models (XX). The higher the weight of a model is, the closer it is to the model providing the better predictions about new data coming from the same underlying generating process. This method includes uncertainty about every model during weights computation and represents one of the less biased and less sensible to overfitting method in Bayesian model selection (XX).

XX being Yao et al. 2017.

I’m ready to take any advice, comments or other references! :)

Thank you!


A non-negative vector of weights summing to 1 that maximizes expected utility when used to weight the predictions of the individual estimated models, where utility is defined as log predictive density of future data.