I’ve been reading @betanalpha’s publication (https://arxiv.org/abs/1604.00695) on diagnosing inefficient exploration of the marginal energy distribution and would like to ensure I understand its implications in the context of Stan’s implementation.

Let’s assume we use a Euclidean-Gaussian HMC sampler with potential U(q)=-\log P(q), where q is the D-dimensional parameter vector and P(q) is the target density. Because we use a Euclidean-Gaussian sampler, the momentum resampling step generates kinetic energy terms T(p)|q\sim \frac12 \chi^2(D), where p is the vector of momenta. The total energy is E(q,p)=U(q)+T(p).

So far so good; let’s consider a particular example as shown in the figure below: the sampler starts with a given parameter and freshly sampled momentum at (1), evolves the state under the Hamiltonian flow to end up at (2), resamples the momentum to obtain (3), followed by another evolution step to end up at (4).

We know that T(p_1) and T(p_3) follow the \frac 12\chi^2(D) distribution discussed above by construction. However, the kinetic energy term at (2) (and similarly for (4)) is given by

where f_q(q_1, p_1) is the evolution of q under the flow; the distribution of T(p_2) seems non-trivial and depends on q_1.

Stan only provides one `energy__`

value for each sample (which is fine because energy is conserved), but it is not obviously clear to me what the corresponding state is. I’ve run some toy models and evaluated `kinetic_energy = energy__ + lp__`

as provided by Stan. Indeed, the kinetic energy terms follow the \chi^2 distribution discussed above which leads me to the conclusion that Stan reports the state after momentum resampling but before evolution (i.e. states (1) or (2) rather than (3) or (4)). Is that interpretation correct?