HMC (jittered) vs. NUTS on 1000-dimensional standard normal

This is a great. I just have a few comments.

I think it will be helpful to separate out static HMC methods (which use a fixed number of leapfrog steps) from the “vintage” HMC method that used a fixed number of leapfrog steps and only considered the end point of the trajectory in a Metropolis proposal. Even with a fixed number of leapfrog steps one can improve considerably on vintage HMC by sampling from the entire trajectory instead of just using the end point.

Speaking of sampling points from a trajectory, the current implementation in Stan isn’t exactly a multinomial sample from the entire trajectory. Following the original NUTS implementation it instead first samples a sub-trajectory (i.e. one of the trajectory expansions) with a bias towards later sub-trajectories and then samples a point from that sub-trajectory. See Section A.3 of https://arxiv.org/abs/1701.02434. This is largely what causes the antithetical behavior for the means.

When considering the optimal number of leapfrog steps the vintage HMC one has to consider the error in the simulated trajectory. The expected jump isn’t quite uniform because the error has a pretty systematic structure – it starts off small, grows, then shrinks again as you move towards the other side of the energy level set. See Figure 5 of https://arxiv.org/abs/1411.6669 for an illustration of the behavior (which is actually exact for a 1D Gaussian target). In other words when the Metropolis acceptance step is taken into account the expected jump distance will stay close to initial point for a while before jumping further away once the number of leapfrog steps get long enough to probe the other side of the level set.

Also, if you extend the step sizes under consideration past 2 you’ll see the instability of the symplectic integrator arise and cause the effective sample size to plummet (although it won’t really mean much anymore as you’ll no longer have unbiased estimators).

2 Likes