I’m trying to compare a few candidate models for many different sets of data. I want to obtain a set of weights for model averaging that are reasonably stable. I’ve noticed that even with small amounts of divergence (rhat for all parameters <1.1), many weighting systems can give quite drastically different weights between different markov chains because the weight is in some way dependent on the exponential of the calculated elppd.

I’m wondering if there’s a robust way to account for between sample variation in weights (i.e. the effect that chain divergence has on WAIC, LOO etc). I know that it’s possible to compute standard errors for WAIC and LOO, but I’m not certain how these relate to the variance between chains. Is there a good way to account for this so that the obtained weights are consistent?

Motivation:

I have a model that I am trying to fit that has a fairly unusual structure. A series of experiments are performed, and each experiment may individually conform to one of a small number of candidate models. I want to know the value of a hyperparameter, common to all experiments. However, this hyperparameter does not inform the data for the experiment in the same way between models.

If we have a set of parameters \theta_j for m models M_j and a hyperparameter \phi and we fit these to a set of n experiments with data y_j, then I believe that the likelihood in this kind of model should look something like:

P(y|\phi)=\prod_{i=1}^n\sum_{j=1}^mP(y_i|\theta_j,\phi)P(M_j)

where P(M_j) are our weights obtained from model averaging at the individual experiment level. As the value of \phi is what we’re really interested in, it’s important that the obtained weights are consistent, or at least have consistent distributions that can be used for informative priors.