Sampling Several Modes

Imagine you want to obtain MCMC samples from both modes of a bimodal posterior density, where the two modes are distinct. To compare these two modes, methods like the WAIC could be used. We would obtain two outcomes WAIC_1 and WAIC_2 based on the MCMC samples coming from the respective mode.
Now my question: Is there a best practice of how to obtain samples from the two models corresponding to these two modes?
The first way to do it would be to start several chains in different starting points, check which chains sampled from which mode (assuming each chain does not jump between modes) and then use the samples corresponding to the respective modes by picking the correct chains.
The second way (assume this is possible) would be to restrict parameters in such a way that we obtain two restricted posterior functions with one mode each. Then, sampling could be done respectively on both posteriors.
Is there a preferred way or are these two ways interchangeable?

I don’t think there’s any “best practice” as the practice is inherently quite tricky and not widely recommended. If possible, one should always strive to avoid the multimodality completely.

If the modes are indeed very well separated, then inits should work quite well, but if the separation is not very strong, some chains will try to switch between the modes, which will usually result in divergent transitions. In those case, hard constraints to restrict to individual modes could work a bit better (or not).

To combine chains sampling different modes, you’d probably want to do Bayesian stacking instead of WAIC.

In some models, there is also a third way to do this:

  1. Introduce a discrete variable that splits the parameter space such that each potential mode belongs to different value of the discrete variable
  2. Marginalize the variable out

Now you can sample both modes in a single chain. There’s an example of this approach at Ideas for modelling a periodic timeseries - #25 by martinmodrak where I index a set of possible modes over a frequency spectrum

Best of luck with your model!

1 Like

Hello Martin,

Thanks for the fast reply! The third way you proposed sounds interesting as well.

Regarding the Bayesian stacking: This seems to be similar to Bayesian Model Averaging, where the modes are combined.
However, I actually see the two modes as two different models that I want to compare based on their respective WAIC values for example. Is that a valid approach as well?

That can definitely be a valid approach in some circumstances. We would however usually recomond to use LOO-IC from the loo package instead of WAIC - see Cross-validation FAQ

1 Like