I’ve been reading @betanalpha’s publication (https://arxiv.org/abs/1604.00695) on diagnosing inefficient exploration of the marginal energy distribution and would like to ensure I understand its implications in the context of Stan’s implementation.
Let’s assume we use a Euclidean-Gaussian HMC sampler with potential U(q)=-\log P(q), where q is the D-dimensional parameter vector and P(q) is the target density. Because we use a Euclidean-Gaussian sampler, the momentum resampling step generates kinetic energy terms T(p)|q\sim \frac12 \chi^2(D), where p is the vector of momenta. The total energy is E(q,p)=U(q)+T(p).
So far so good; let’s consider a particular example as shown in the figure below: the sampler starts with a given parameter and freshly sampled momentum at (1), evolves the state under the Hamiltonian flow to end up at (2), resamples the momentum to obtain (3), followed by another evolution step to end up at (4).
We know that T(p_1) and T(p_3) follow the \frac 12\chi^2(D) distribution discussed above by construction. However, the kinetic energy term at (2) (and similarly for (4)) is given by
where f_q(q_1, p_1) is the evolution of q under the flow; the distribution of T(p_2) seems non-trivial and depends on q_1.
Stan only provides one
energy__ value for each sample (which is fine because energy is conserved), but it is not obviously clear to me what the corresponding state is. I’ve run some toy models and evaluated
kinetic_energy = energy__ + lp__ as provided by Stan. Indeed, the kinetic energy terms follow the \chi^2 distribution discussed above which leads me to the conclusion that Stan reports the state after momentum resampling but before evolution (i.e. states (1) or (2) rather than (3) or (4)). Is that interpretation correct?