Details of initial convergence towards typical set in Hamiltonian Monte Carlo

After reading the introductory HMC paper, I’m confused about the initial warm-up phase. What exactly is happening there?

I’m especially interested in the mathematical details behind

a) The chain rapidly approaching the typical set, and

b) warm-up stopping criteria, i.e. how the chain knows that it got close enough to the typical set and can start leapfrogging now.

Thank you

There are a lot of mathematical details here.

a) You want to look at convergence of MCMC to the stationary (aka equilibrium) distribution. There are lots of ways to measure that. In cases where you can prove geometric ergodicity, you get exponential decay in total variation distance (i.e., the maximum error in calculating an event probability). Another approach is to look at convergence of square error in expectations. You can also do this empirically by monitoring the trace plot of the log density function. When it asymptotes and then starts bouncing around, you’re there. I really like this paper for a good high-level and not too mathematically intense intro from Roberts and Rosenthal:

Section 3.4 covers geometric ergodicity. Since you asked for math, here are two papers about geometric ergodicity for HMC:

b) We don’t have warmup stopping criteria in Stan. We run HMC/NUTS and hence leapfrog, during warmup. By default, Phase I of warmup runs 75 iterations (when there are 1000 warmup iterations), because we’ve found that works well empirically. Sometimes it takes a little longer. It’d be better to control this somehow, but we haven’t tried yet.