Hello to everyone!

I have been fighting with a hierarchical model for some time, and have been reading various threads and papers regarding potential solutions to the problems I am encountering. In particular, currently I have that when I run multiple chains some of them display low E-BMFI, which is caused by some detours in tails of HMC. I do not currently observe divergences, although I often saturate with three-depth 11.

When reading some of the recommended papers in STAN, like Betancourt,Girolami(2013), I sometimes struggle to understand the intuitions they try to explain, and hence then struggle as a result to attempt to applying them to my use case. In particular, whenever HMC papers are presented, I always struggle to understand why no one ever mentions the momentum resampling, or understand its role. For example, in the above paper there is a section dedicated to *limited density variations*, in which the authors state:

A more subtle, but no less insidious, vulnerability of Euclidean HMC concerns density variations with a transition. In the evolution of the system, the Hamiltonian function,

H \! \left(p, q \right) = T \! \left(p | q \right) + V \! \left(q \right),

is constant, meaning that any variation in the potential energy must be compensated for by an opposite variation in the kinetic energy. In Euclidean Hamiltonian Monte Carlo, however, the kinetic energy is a \chi^{2} variate which, in expectation, varies by only half the dimensionality, d, of the target distribution. Consequently the Hamiltonian transitions are limited to

\Delta V = \Delta T \sim \frac{d}{2},

restraining the density variation within a single transition.

I have been trying to understand in detail the above claim, but I am left confused on many levels; it seems that the limited variation in the log-likelihood part is tied to the fact that the Hamiltonian has to be(approximately) constant across an integration step, hence any variation in the log-likelihood has to be counterbalanced by an(approximately) opposite variation in the Kinetic energy. Seems to me though that this type of analysis completely ignores the momentum resampling at every MCMC step, which indeed clearly changes the Hamiltonian between steps(to my understanding). This resampling though does, under standard implementation, follow a chi-squared distribution, so I don’t understand if I am actually missing how the momentum resampling enters in the above analysis.

Could anyone clarify the above statement, in particular its relation to the momentum resampling?

Best,

Luca