Hamiltonian Monte Carlo, positive recurrence

felixmueller · May 20, 2019, 9:30am

Hey!

I do realise that my question off-topic for Stan as it is purely about theoretical HMC. Still, I thought this was the best place to try to get first help on this. If it is improper here, let me know and/or close the question.

I am struggling to understand why the Markov chain Hamiltonian Monte Carlo produces is positive recurrent. From my understanding any MCMC algorithms needs to produce a chain that is irreducible, positive recurrent and aperiodic so that the chain converges to its stationary distribution and with e.g. the Detailed Balance condition it is ensured that that stationary distribution is the target distribution we specified via potential energy.
I see that HMC produces a chain that is irreducible (because any momentum p can be drawn from the maxwell-boltzman/normal distribution) and aperiodic (because it is always possible to reject a proposal and thus every state has period 1). But with assuring positive recurrence, I don’t find a way. At first I turned to the introduction to HMC by Micheal Betancourt, but didn’t find anything. Now I found the paper “On the convergence of Hamiltonian Monte Carlo” by Alain Durmus, Éric Moulines, and Eero Saksman on arxiv (Link: https://arxiv.org/pdf/1705.00166) and the arguments should suffice but are pretty technical.

I was wondering if there is a simpler argument (maybe following from other properties of MCMC/HMC) for positive recurrence and thought that if somewhere, then here is the best place to start.

felixmueller · May 20, 2019, 3:41pm

I believe to have found an answer. I have found in (Brémaud, P.: Markov Chains, Gibbs Fields, Monte Carlo Simulation, and Queues - theorems 3.1, 3.2) that an irreducible Markov chain is positive recurrent if and only if it has a stationary distribution. For HMC I already have that it is irreducible and that its stationary distribution is the target distribution (due to Detailed Balance).

I will mark my own answer as the answer. Feel free to delete this thread, otherwise I would leave it open as a thought process.

Bob_Carpenter · May 20, 2019, 8:51pm

We prefer not to delete threads.

For understanding this material, the best resource I’ve found is:

Gareth Roberts and Jeff Rosenthal. 2003. Markov chain Monte Carlo.

Rosenthal’s book (A First Course in Rigorous Probability Theory or something like that) has more detail and is also highly accessible (compared to the alternatives).

betanalpha · May 21, 2019, 1:26am

Recurrence is a subtle topic on continuous state spaces, and so one typically focuses instead on demonstrating \phi-irreducibility, aperiodicity, and stationarity. See for example Theorem 4 of https://projecteuclid.org/euclid.ps/1099928648.

Demonstrating these results for an implementation of HMC is tricky and depends on the particular details of one’s implementation. For example the current implementation of HMC in Stan uses dynamic integration times and samples states from the entire discrete trajectory instead of just proposing the final point in a Metropolis step.

For this implementation we get

Aperiodicity because the probability of sampling the initial state from any numerical trajectory is non-zero by construction.
\phi-irreducibility requires much more than the momenta distribution being supported over all of \mathbb{R}^{N}. In particular we need the momenta to be able to average out over the gaps in the discrete leapfrog integrator. Because the probability of sampling a trajectory of length L = 1 is non-zero for our implementation we co-op results from Langevin Monte Carlo, https://projecteuclid.org/euclid.bj/1178291835, or we can just use the result from Durmus et al. See Section 5.1 of https://arxiv.org/abs/1601.08057 of more discussion.
Stationarity is proved in the appendix of https://arxiv.org/abs/1701.02434, first for chains with static integration times then for appropriate dynamic methods.

That said, these three conditions guarantee only asymptotic convergence which says nothing about how the chains will behave for finite times, which is what we really care about in practice. To get finite-time guarantees we need a central limit theorem which typically requires geometric ergodicity and minorization conditions. These are much harder to prove but we can identify obstructions which we can then turn into diagnostics as discussed in https://arxiv.org/abs/1601.08057. In particular, this is one way of motivating the divergences that we all know and love.

felixmueller · May 21, 2019, 10:35am

Thanks for the quick answers and the provided material!

I have rewritten this post now with updated realisations.

I was under the impression that positive recurrence (combined with irreducibility and aperiodicity) is needed to ensure the asymptotical convergence to the target distribution, but as the paper on projecteuclid you provided shows, this is not the case.

Why is (positive) recurrence of any interest for MCMC then? Are there other situations where one of these properties is infeasible to prove and thus recurrence is needed?

Thanks for making me aware of the implementation-specific detail of these proofs, that was not in my mind.

The last time I frequented this forum was about two years ago. I am happy to see that it is still the helpful productive place I remembered it to be.

betanalpha · May 21, 2019, 2:25pm

My understanding is that recurrence is a more natural consideration for discrete spaces, especially discrete spaces with a finite number of states. One of the challenges of going deeper into MCMC theory is that much of the introductory material is limited to finite state space perspectives, and the transition to continuous state spaces becomes a monstrous hurdle. MCMC on general state spaces is hard which is why I have focused so much on exposing diagnostics and consequences to users rather than any theory.

Bob_Carpenter · June 4, 2019, 6:12pm

The Roberts and Rosenthal paper (and the Rosenthal book) are what enabled me to clear the hurdle (or at least knock it down and keep running!).

I found it all a lot easier when I realized a lot of the continuous results were based on an implicit discretization. That is, a continuous space is partitioned into a countable (or finite) number of sets of non-zero measure. Then you can define the transition probabilities among the elements of the partition. Then when everything’s working, that should be a well-behaved discrete chain. The continous results for recurrence, reducibility, etc., can be stated as universally quantifying over possible partitions.

Topic		Replies	Views
Insights from Dynamic HMC animation Algorithms	2	978	September 16, 2020
Tried to Speed Up Stan (HMC) by Rewriting Dynamics, failed Algorithms	2	676	June 17, 2020
Details of initial convergence towards typical set in Hamiltonian Monte Carlo Algorithms	3	68	November 7, 2024
Quick question about HMC General	3	473	September 21, 2023
Standalone HMC sampler Algorithms mcmc	7	2271	May 3, 2018

Hamiltonian Monte Carlo, positive recurrence

Related topics