# Implementing Model Averaging

hi there,

I’m a non-statistician, new to Stan & R, coming from Python & PyMC3.

I would like to learn to implement model averaging using the weights I’ve obtained from calling
loo::loo_model_weights on a collection of RStan fitted models.

I’ve come across rethinking::ensemble but it requires models to be fit using a corresponding rethinking::map function which I’d prefer not to switch to.

Similarly, In PyMC3 this is straightforward with

pymc3.sampling.sample_posterior_predictive_w(traces, samples=None, models=None, weights=None, random_seed=None, progressbar=True)
described as:
Generate weighted posterior predictive samples from a list of models and a list of traces according to a set of weights.

The paper Using Stacking to Average Bayesian Predictive Distributions (with Discussion) has supplement material with R code but this only shows obtaining the model weights, not actually using the weights to generate a posterior accordingly.

I would be really grateful for some pointers on how to do this in RStan.

Thank you

The final outcome is the (linear) weighted average of predictions from each model.

Thank you yuling, that’s pretty straightforward.
I think I was overcomplicating it!

I have also computed the weighted average of the log-likelihoods from the posterior and fed that back into loo to obtain a WAIC value that I compare to the individual models’.

This is a bit ambiguous that what does predictions mean here. The final predictive distribution is the (linear) weighted average of the predictive distributions from each model. But we don’t have access directly to predictive distributions, but usually only to posterior draws from them- To get posterior predictive draws from the model average do this

yrep1 <- posterior_predict(fit1, draws=round(wts[1]*S))
yrep2 <- posterior_predict(fit2, draws=round(wts[2]*S))
...
yrep <- c(yrep1, yrep2, ...)


where S is the number of draws from each posterior predictive distribution. Then from yrep you can compute what you need. @nels, is this what you have done?

This is wrong. 1 ) You can’t average log-likelihoods directly, 2) the importance sampling ratio r_i^{(s)} is not anymore proportional to 1/p(y_i|\theta^{(s)}), and 3) you can’t use psis-loo for checking the effect of the weight computation. 3 would require cross.validation over the weight computation, but if we ignore that then you could compute log of average of loo-predictive densities (which have been computed for each model). I don’t think there is an easy way to do 3.

The key point is you have to take linear average on densities first and take log afterwards. From a applied perspective, you can hold out an independent test dataset and calculate the weighted predictive density on that test set.

approximation of marginal WAIC?

Thank you @avehtari for your feedback.

Yes, taking y_rep samples from each model in proportion to the weights.
(just using rstan::sampling directly, with y_rep as a generated quantity, instead of the rstanarm::posterior_predict interface in your code sample.)

Thanks, I suspected it wouldn’t be that simple.
Abandoning my attempt at loo-comparing the averaged model to individual models for time being.

Thanks again.

Hello @avehtari and team
I have followed this thread of Implementing Model Averaging, and also the guidance you provided in your paper Using Stacking to Average Bayesian Predictive Distributions but I seem to be unclear on the implementation of it.
This is what I have done;
I have fitted 3 logistic regression models as shown below and I would like to average them based on their weights using the stacking method.

fit1 <- stan_glm(outcome ~predictor1 ,
data = ds,
family = "binomial",
refresh = 0)
fit2 <- stan_glm(outcome ~ predictor2,
data = ds,
family = "binomial",
refresh = 0)
fit3 <- stan_glm(outcome ~ predictor3,
data = ds,
family = "binomial",
refresh = 0)

# log likelihood
log_lik_fit1 <- rstanarm::log_lik(fit1)
log_lik_fit2 <- rstanarm::log_lik(fit2)
log_lik_fit3 <- rstanarm::log_lik(fit3)

#stacking weights
model_weights<- loo_model_weights(list(
log_lik_fit1, log_lik_fit2, log_lik_fit3)
)
Method: stacking
------
weight
model1 0.413
model2 0.207
model3 0.380


My question:
How do I utilise the weights above to implement the model averaging?
A code snippet would be appreciated.
Thank you.

There is a code snippet in the post Implementing Model Averaging - #4 by avehtari. Is there something unclear in that?

Thank you @avehtari for that guidance and this is what have done.

yrep1 <- posterior_predict(fit1, draws=round(0.413*1000))
yrep2 <- posterior_predict(fit1, draws=round(0.207*1000))
yrep3 <- posterior_predict(fit1, draws=round(0.380*1000))

yrep <- c(yrep1, yrep2, yrep3)


I have used 1000 draws multiplied by the weight of each model. I hope this is correct.

On the number of draws, I used 1000 but I am not sure whether this is adequate.Kindly clarify.
How do I use the yrep to compute out-of-sample model performance statistics e.g. discrimination, and calibration?

It depends, see e.g. Bayesian workflow book - Digits

1 Like