Hamiltonian Monte Carlo, positive recurrence

Hey!

I do realise that my question off-topic for Stan as it is purely about theoretical HMC. Still, I thought this was the best place to try to get first help on this. If it is improper here, let me know and/or close the question.

I am struggling to understand why the Markov chain Hamiltonian Monte Carlo produces is positive recurrent. From my understanding any MCMC algorithms needs to produce a chain that is irreducible, positive recurrent and aperiodic so that the chain converges to its stationary distribution and with e.g. the Detailed Balance condition it is ensured that that stationary distribution is the target distribution we specified via potential energy.
I see that HMC produces a chain that is irreducible (because any momentum p can be drawn from the maxwell-boltzman/normal distribution) and aperiodic (because it is always possible to reject a proposal and thus every state has period 1). But with assuring positive recurrence, I don’t find a way. At first I turned to the introduction to HMC by Micheal Betancourt, but didn’t find anything. Now I found the paper “On the convergence of Hamiltonian Monte Carlo” by Alain Durmus, Éric Moulines, and Eero Saksman on arxiv (Link: https://arxiv.org/pdf/1705.00166) and the arguments should suffice but are pretty technical.

I was wondering if there is a simpler argument (maybe following from other properties of MCMC/HMC) for positive recurrence and thought that if somewhere, then here is the best place to start.

I believe to have found an answer. I have found in (Brémaud, P.: Markov Chains, Gibbs Fields, Monte Carlo Simulation, and Queues - theorems 3.1, 3.2) that an irreducible Markov chain is positive recurrent if and only if it has a stationary distribution. For HMC I already have that it is irreducible and that its stationary distribution is the target distribution (due to Detailed Balance).

I will mark my own answer as the answer. Feel free to delete this thread, otherwise I would leave it open as a thought process.

2 Likes

We prefer not to delete threads.

For understanding this material, the best resource I’ve found is:

Rosenthal’s book (A First Course in Rigorous Probability Theory or something like that) has more detail and is also highly accessible (compared to the alternatives).

Recurrence is a subtle topic on continuous state spaces, and so one typically focuses instead on demonstrating \phi-irreducibility, aperiodicity, and stationarity. See for example Theorem 4 of https://projecteuclid.org/euclid.ps/1099928648.

Demonstrating these results for an implementation of HMC is tricky and depends on the particular details of one’s implementation. For example the current implementation of HMC in Stan uses dynamic integration times and samples states from the entire discrete trajectory instead of just proposing the final point in a Metropolis step.

For this implementation we get

  • Aperiodicity because the probability of sampling the initial state from any numerical trajectory is non-zero by construction.
  • \phi-irreducibility requires much more than the momenta distribution being supported over all of \mathbb{R}^{N}. In particular we need the momenta to be able to average out over the gaps in the discrete leapfrog integrator. Because the probability of sampling a trajectory of length L = 1 is non-zero for our implementation we co-op results from Langevin Monte Carlo, https://projecteuclid.org/euclid.bj/1178291835, or we can just use the result from Durmus et al. See Section 5.1 of https://arxiv.org/abs/1601.08057 of more discussion.
  • Stationarity is proved in the appendix of https://arxiv.org/abs/1701.02434, first for chains with static integration times then for appropriate dynamic methods.

That said, these three conditions guarantee only asymptotic convergence which says nothing about how the chains will behave for finite times, which is what we really care about in practice. To get finite-time guarantees we need a central limit theorem which typically requires geometric ergodicity and minorization conditions. These are much harder to prove but we can identify obstructions which we can then turn into diagnostics as discussed in https://arxiv.org/abs/1601.08057. In particular, this is one way of motivating the divergences that we all know and love.

1 Like

Thanks for the quick answers and the provided material!

I have rewritten this post now with updated realisations.

I was under the impression that positive recurrence (combined with irreducibility and aperiodicity) is needed to ensure the asymptotical convergence to the target distribution, but as the paper on projecteuclid you provided shows, this is not the case.

Why is (positive) recurrence of any interest for MCMC then? Are there other situations where one of these properties is infeasible to prove and thus recurrence is needed?

Thanks for making me aware of the implementation-specific detail of these proofs, that was not in my mind.

The last time I frequented this forum was about two years ago. I am happy to see that it is still the helpful productive place I remembered it to be.

My understanding is that recurrence is a more natural consideration for discrete spaces, especially discrete spaces with a finite number of states. One of the challenges of going deeper into MCMC theory is that much of the introductory material is limited to finite state space perspectives, and the transition to continuous state spaces becomes a monstrous hurdle. MCMC on general state spaces is hard which is why I have focused so much on exposing diagnostics and consequences to users rather than any theory.

2 Likes

The Roberts and Rosenthal paper (and the Rosenthal book) are what enabled me to clear the hurdle (or at least knock it down and keep running!).

I found it all a lot easier when I realized a lot of the continuous results were based on an implicit discretization. That is, a continuous space is partitioned into a countable (or finite) number of sets of non-zero measure. Then you can define the transition probabilities among the elements of the partition. Then when everything’s working, that should be a well-behaved discrete chain. The continous results for recurrence, reducibility, etc., can be stated as universally quantifying over possible partitions.