Arguments for HMC

A few years ago, I was at a Bayesian conference when one of the speakers said, somewhat apologetically, that they had started work on their project two years earlier when “everybody was into Stan, but of course now everybody uses Bouncy Particle Sampler”. I laughed at the absurd fashion-chasing, but now I’m less sure. In training courses, coaching, etc, this kind of concern can come up.

I’m interested in what people say in such a discussion. What are your arguments in favour of HMC? There’s of course a good reason to use a probabilistic programming language, but setting that aside…

3 Likes

I believe there have been several discussions on this topic. IIRC This is the most recent one.

Nice, thanks for pointing me to that. I especially appreciated contributions there from @maxbiostat and @spinkney. I was less interested in defending Stan the platform (that seems straightforward) as HMC/NUTS.

2 Likes

TL;DR: Dynamic HMC is robust enough and has good diagnostics. You might find an algorithm that outperforms it, but the implementation overhead better be worth it.

I honestly find this a bit weird. The way I see it, algorithms are ways of solving problems. What matters is the problem, not the algorithm. So whenever someone says “HMC >> RWMH” or “Bouncy >> HMC”, my first question is: “for which target(s)?” and the second is “what kind of gains are we talking about?”. As an applied statistician, I care deeply about having something dependable I can write my models in so I can focus on the modelling. I love the fact I can write the model I have in mind in Stan and get posterior expectations in a fast and robust fashion.

So Stan is great because it’s a well-maintained language with a mature implementation of a sampling method (dynamic HMC, dHMC) which is robust enough to work well in many cases. More importantly, perhaps, there are built-in diagnostics that tell you when things go awry.

As regards to dHMC specifically [which I think is really the core of your question], I think the particular implementation in Stan is robust enough to overperform naive Gibbs and MH for most targets people care about, which coupled with the fact it’s easy to use thanks to Stan, make it really attractive as a first-stab at solving a particular sampling problem. The diagnostics, which are not exclusive to Stan and thus should be associated with dHMC, are also a big plus.

There are models which are not very easy to fit with HMC (in Stan), such as those with discrete latent structure, but the silver-lining is that marginalising over the discrete parameters actually leads to more efficient estimators of the other quantities in the model.

One of the things I work on is phylogenetics, where HMC is slowly coming through, but I’m sure other (non-reversible, mainly) algorithms could still provide substantial gains. So, in that realm, HMC could well not be the best choice. If someone comes up with Stan for trees, though, the arguments above might apply.

3 Likes

Good point about diagnostics. Anyone writing their own code for newer sampling algorithms will have to pick over outputs forensically.

1 Like