Clarification on Variation of density in HMC

Hello to everyone!

I have been fighting with a hierarchical model for some time, and have been reading various threads and papers regarding potential solutions to the problems I am encountering. In particular, currently I have that when I run multiple chains some of them display low E-BMFI, which is caused by some detours in tails of HMC. I do not currently observe divergences, although I often saturate with three-depth 11.

When reading some of the recommended papers in STAN, like Betancourt,Girolami(2013), I sometimes struggle to understand the intuitions they try to explain, and hence then struggle as a result to attempt to applying them to my use case. In particular, whenever HMC papers are presented, I always struggle to understand why no one ever mentions the momentum resampling, or understand its role. For example, in the above paper there is a section dedicated to limited density variations, in which the authors state:

A more subtle, but no less insidious, vulnerability of Euclidean HMC concerns density variations with a transition. In the evolution of the system, the Hamiltonian function,

H \! \left(p, q \right) = T \! \left(p | q \right) + V \! \left(q \right),

is constant, meaning that any variation in the potential energy must be compensated for by an opposite variation in the kinetic energy. In Euclidean Hamiltonian Monte Carlo, however, the kinetic energy is a \chi^{2} variate which, in expectation, varies by only half the dimensionality, d, of the target distribution. Consequently the Hamiltonian transitions are limited to

\Delta V = \Delta T \sim \frac{d}{2},

restraining the density variation within a single transition.

I have been trying to understand in detail the above claim, but I am left confused on many levels; it seems that the limited variation in the log-likelihood part is tied to the fact that the Hamiltonian has to be(approximately) constant across an integration step, hence any variation in the log-likelihood has to be counterbalanced by an(approximately) opposite variation in the Kinetic energy. Seems to me though that this type of analysis completely ignores the momentum resampling at every MCMC step, which indeed clearly changes the Hamiltonian between steps(to my understanding). This resampling though does, under standard implementation, follow a chi-squared distribution, so I don’t understand if I am actually missing how the momentum resampling enters in the above analysis.

Could anyone clarify the above statement, in particular its relation to the momentum resampling?

It’s the same chi-squared distribution but momentum resampling is not directly relevant for the above analysis.

Hamiltoniam Monte Carlo alternates between two transitions on the phase space: Hamiltonian evolution and momentum resampling.

I think you (inspired by the definition of E-BFMI) picture this as a process where H (aka energy) stays (almost) constant during Hamiltonian evolution and then changes abruptly on momentum resampling.
Betancourt&Girolami, on the other hand, study the behaviour of V (=\log p), which changes during Hamiltonian evolution but remains constant on momentum resampling.

In the paragraph you quote they argue that if the initial point is from the equilibrium distribution then momentum at the end must also be chi-squared distributed on average, and this limits how much V can change during Hamiltonian evolution.
Of course these perspectives are related by the fact that H and V differ only by T (and T is always sampled perfectly). Just as Hamiltonian evolution cannot change T or V by more than \sqrt{d/2}, momentum resampling cannot change T or H by more than \sqrt{d/2}.

Either way, the important point is that in a hierarchical model the probability density for the parameters is often strongly correlated with the hyperparameters and consequently the posterior variance of V is much larger than d/2.

1 Like

Thank you for the detailed answer, now it is much clearer.