Riemann manifold HMC in Stan

Hello.

I have been going through the tunning parameters from Hamiltonian Monte Carlo and the variant NUTS sampler. I have seen some references including the Machine learning summer school from 2014 where it is explained why the Riemann variant HMC works better. Actually in the MLLS 2014, Michael Betancourt mentions that Stan developers is working on incorporating this idea.

However, I have seen in the docs that the matrix M from the potential energy is fixed through the warm up time. So I guess that for complex geometries with different curvatures at different parts of the parameter space fixing this matrix during warm up will not be the optimal.

My question is double:

  • Is stan planning to have an implementation of Riemann manifold HMC?
  • Is there any recommended practice for avoiding the pathology of fixing M that I mention? Also if I am wrong in my claim, I would really appreciate a correction.

Thanks in advance.

4 Likes

@betanalpha

1 Like

Hi,

I’ve been experimenting with RHMC outside of Stan (but with the aim of possible eventual inclusion) to varying degrees of success.

Outside of toy problems, developing a generally applicable metric has proven to be pretty difficult. Sooner or later it breaks down due to the implicit parts of the integrator required.

That said, I have had some luck with an integration scheme based on [1], however, I did not manage to Metropolize it, so it’s still produces biased results (although not noticeably so in the toy examples that I tested).

[1] Y.-A. Ma, T. Chen, and E. B. Fox, “A Complete Recipe for Stochastic Gradient MCMC,” arXiv:1506.04696 [math, stat] , Oct. 2015, Accessed: Mar. 30, 2020. [Online]. Available: http://arxiv.org/abs/1506.04696.

1 Like

Have you heard about jamón? http://betanalpha.github.io/jamon/ Not sure if this could be the possible solution

2014 was…a more optimistic time.

Back in the early days of Stan our understanding of Hamiltonian Monte Carlo was relatively immature. Riemannian Hamiltonian Monte Carlo was a promising generalizing of existing implementations of Hamiltonian Monte Carlo but there were plenty of implementation challenges remaining including a useful metric (https://arxiv.org/abs/1212.4693) and generalizing the original No-U-Turn algorithm to work with any variant of Hamiltonian Monte Carlo (https://arxiv.org/abs/1304.1920). While those more theoretical results were being worked out we were also redesigning the Stan back end and I took advantage of refactoring the Hamiltonian Monte Carlo code in Stan to be extremely modular so that it could support different kinetic energies (constant/varying metric, metrics of different structure), different numerical integrators (explicit, implicit, varying orders), and the like. To do this day there is a dynamic Riemannian Hamiltonian Monte Carlo implementation in Stan – it just isn’t exposed through most of the interfaces!

Why isn’t that Riemannian Hamiltonian Monte Carlo implementation exposed? Well at the same time we were developing the theory and implementation for Riemannian Hamiltonian Monte Carlo we were also learning more about the tuning of the numerical integrator (https://arxiv.org/abs/1411.6669) convergence properties of the method (https://arxiv.org/abs/1411.6669). In particular based on the stability of the numerical integrator I introduced the concept of a divergent transition as a diagnostic of incomplete exploration. The problem with Riemannian Hamiltonian Monte Carlo is that the numerical integration can actually be unstable in different ways, some related to the geometry (and divergences) and some related to the implementation (in particular the fixed point iterations needed to evaluate the implicit integrator). In practice it wasn’t clear how to differentiate these two issues and hence how practitioners could respond in principled ways.

Ultimately the numerical integration of Riemannian Hamiltonian Monte Carlo was too fragile and required too much hand-tuning from the user to work reasonably well. As the goal of Stan was to hide those algorithmic details from the user as much as possible this proved to be a very unwelcome user experience and so enthusiasm for these methods waned.

At the same time we were learning more about the kinds of model degeneracies that tended to break the numerical integrator in Hamiltonian Monte Carlo (https://betanalpha.github.io/assets/case_studies/identifiability.html) and various ways to resolve those degeneracies by modifying the model. Because Stan users specify their own model through a Stan program already, which was a much more natural extension to the Stan user experience than forcing users to learn more about the underlying (and complex) algorithms.

In fact I recently showed that the reparameterizations that can improve the performance of Euclidean Hamiltonian Monte Carlo (constant metric) are actually equivalent to running Riemannian Hamiltonian Monte Carlo (https://arxiv.org/abs/1910.09407). Consequently playing around with the model implementation can achieve the same performance as playing around with the algorithm!

21 Likes

Thanks for this reply. Will add it to my TODO of things to study. Absolutely fantastic.

2 Likes