Model averaged parameter estimates

I’m using Stan to fit clines of allele frequency across a hybrid zone. There are multiple cline models which can be used to fit this sort of data. The models share the same core parameters (the center and width of the hybrid zone), but vary in their inclusion of other parameters to describe the tails of the clines.

I’d like to include uncertainty with regard to model choice and get the model-averaged parameter estimates of the parameters I care most about (cline width and cline center). But, I’m unsure how best to go about this. In searching this forum and the Stan google groups, it seems using mixture models is the preferred way of doing model averaging within Stan, but I’m baffled about how to apply this to my models. Attached are two of the .stan models I’m using and a dataset from which to estimate clines. Any advice on how to begin making mixture models from these would be much appreciated.

Finally, a related but non-Stan specific question: my (probably naive) expectation when I began thinking about model averaging was that I could construct a sort of “model-averaged posterior” for each parameter by randomly drawing samples from the posterior distributions of each model, with the chance of drawing from a particular model being equal to that model’s Akaike weight (calculated from WAIC). This idea came from Richard McElreath’s Rethinking Statistics book, where he uses the same technique to make an ensemble of posterior predictive checks from multiple models. I wondered if that could work for parameter estimates as well and I could just avoid the mixture modeling process.


fixed_none.stan (2.3 KB)
fixed_right.stan (3.3 KB)
cline_data.csv (383 Bytes)

Mixture model is most sensible if it is natural description of the data generating process. In your case it seems it’s more like a set of approximate models. That corresponds to M-open case, where Bayesian stacking
Using Stacking to Average Bayesian Predictive Distributions (with Discussion) is better (and mixture model can fail badly especially in case of small data)

Yes, but as discussed and demonstrated in Using Stacking to Average Bayesian Predictive Distributions (with Discussion) stacking is better. The paper also describes Pseudo-BMA+ which has same computational burden as WAIC weights, but with better performance by taking into account the uncertainty about the future. Also instead of WAIC, it’s better to use more stable and easy to diagnose PSIS-LOO [1507.04544] Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Thanks for the response! I went through the papers, and while I can’t say I fully understand all the math and the notation, I’ve made some progress. Working from the model stacking paper you link to, I’ve managed to calculate the Pseudo-BMA weights by calculating the elpd_loo for each model with the loo R package and then using equation 8 (or the log-normal approximation below it) to get the weights.

As for true model stacking, I tried to use the model_weights function mentioned in the supplemental material to the paper to get the model weights, but it doesn’t seem to exist in the version of the loo package that’s available from CRAN (1.1.0).

Whether I get the model weights form Pseudo-BMA or model stacking, I’m confused about the next step. Once I have the weights, do I make the model-averaged predictive distribution the way I mentioned above (sampling from the posterior distributions of the models according to these weights), or am I still missing something?

Thanks again!


It’s in a branch in github, and soon in CRAN, but we need to implement tests for the package. There has been some unfortunate delays, but I hope it’s in CRAN in couple weeks. Or you can install from github.

Yes for predictive distributions. Stacking and Pseudo-BMA+ are meant for combining predictive distributions.

Posterior distributions of parameters are different from posterior predictive distributions. Even if you have shared parameters, it is not guaranteed that their role is same in different models, and dependencies between shared and non-shared parameters and possible model mis-specifications may produce surprising results. See, e.g., “Learning about physical parameters: The importance of model discrepancy”

and references there-in.