Simulated tempering and multimodal sampling

I am advertising an augmented sampling method in preprint Adaptive Path Sampling in Metastable Posterior Distributions (https://arxiv.org/abs/2009.00471) by Collin Cademartori, Aki, Andrew, and me. We develop an efficient and automated implementation of path sampling and adaptive simulated tempering. The application is to sample from multimodal distributions—and readily employable in Stan. Unlike mutli-chain stacking, the target posterior is ``exact" or at least aimed to be exact this time.

The term “path sampling” can mean many distinct concepts. What we refer to here is normalizing constant/ free energy computation using some thermodynamic integration. TI is an old idea dating back to 1930s. Tempering, on the other hand, is another old (90s) method but also has dimension limitations. In traditional tempering schemes, the number of interpolating densities scales linearly with the dimension, soon becoming unaffordable in high-D (plus a random walk in temperature space is inefficient in the first place). Our new method samples the inverse temperature continuously (hence is gradient-informed), and use path sampling to estimate, parametrically-smooth, and update the normalizing constant. Computationally, this method is able to achieve higher accuracy and efficiency (ESS/per second) in metastable densities. It shares certain similarity to Michael’s early work on Adiabatic Monte Carlo, and can be viewed as a reversible adiabatic process if I am allowed to make up analogies.

We created an R package (https://github.com/yao-yl/path-tempering) that allows a black box implantation of path sampling in stan (Thanks to help from @rok_cesnovar). It constructs a geometry bridge between the original stan model and any alternative model. Even when there is obvious metastability, it may be sometimes more efficient to fit two models together (so the parameters can “talk”).

If you have encounter any metastable sampling in real problem, feel free to try this new sampling scheme using stan. Of course, tempering itself is unlikely to scale well to really high-D, and the new method is not a panacea for metastability, but we expect it to be better than many alternatives.

5 Likes

This is so cool! Thank you and everyone involved.

Two related questions.

  1. In the github repo do you have any suggestions on when this will not work well outside of the 3 cases or
  2. Things that won’t work well when working within the paradigm of those 3 cases?