I’m a non-statistician, new to Stan & R, coming from Python & PyMC3.

I would like to learn to implement model averaging using the weights I’ve obtained from calling loo::loo_model_weights on a collection of RStan fitted models.

I’ve come across rethinking::ensemble but it requires models to be fit using a corresponding rethinking::map function which I’d prefer not to switch to.

Similarly, In PyMC3 this is straightforward with

pymc3.sampling.sample_posterior_predictive_w(traces, samples=None, models=None, weights=None, random_seed=None, progressbar=True)
described as: Generate weighted posterior predictive samples from a list of models and a list of traces according to a set of weights.

Thank you yuling, that’s pretty straightforward.
I think I was overcomplicating it!

I have also computed the weighted average of the log-likelihoods from the posterior and fed that back into loo to obtain a WAIC value that I compare to the individual models’.

This is a bit ambiguous that what does predictions mean here. The final predictive distribution is the (linear) weighted average of the predictive distributions from each model. But we don’t have access directly to predictive distributions, but usually only to posterior draws from them- To get posterior predictive draws from the model average do this

where S is the number of draws from each posterior predictive distribution. Then from yrep you can compute what you need. @nels, is this what you have done?

This is wrong. 1 ) You can’t average log-likelihoods directly, 2) the importance sampling ratio r_i^{(s)} is not anymore proportional to 1/p(y_i|\theta^{(s)}), and 3) you can’t use psis-loo for checking the effect of the weight computation. 3 would require cross.validation over the weight computation, but if we ignore that then you could compute log of average of loo-predictive densities (which have been computed for each model). I don’t think there is an easy way to do 3.

The key point is you have to take linear average on densities first and take log afterwards. From a applied perspective, you can hold out an independent test dataset and calculate the weighted predictive density on that test set.

Yes, taking y_rep samples from each model in proportion to the weights.
(just using rstan::sampling directly, with y_rep as a generated quantity, instead of the rstanarm::posterior_predict interface in your code sample.)

Thanks, I suspected it wouldn’t be that simple.
Abandoning my attempt at loo-comparing the averaged model to individual models for time being.

Hello @avehtari and team
I have followed this thread of Implementing Model Averaging, and also the guidance you provided in your paper Using Stacking to Average Bayesian Predictive Distributions but I seem to be unclear on the implementation of it.
This is what I have done;
I have fitted 3 logistic regression models as shown below and I would like to average them based on their weights using the stacking method.

I have used 1000 draws multiplied by the weight of each model. I hope this is correct.

On the number of draws, I used 1000 but I am not sure whether this is adequate.Kindly clarify.
How do I use the yrep to compute out-of-sample model performance statistics e.g. discrimination, and calibration?