What acceptance probability does `adapt delta` target

charlesm93 · February 28, 2023, 12:02am

I’m trying to better understand what adapt_delta does and in particular, which acceptance probability it targets. Neither NUTS, nor the dynamic HMC implemented in Stan have an accept/reject step.

One could still compute either (i) the probability of accepting the final point of the trajectory in a hypothetical accept/reject step, (ii) the probability of accepting the sampled point in the trajectory, or (iii) the the average probability of accepting any point along a trajectory.

With NUTS and dynamic HMC, we generate a multiplicative trajectory expansion, but my understanding is that all the points generated along the way should appear in the final trajectory.

From the NUTS paper (Hoffman & Gelman 2014),

[The targeted probability] is the average acceptance probability that HMC would give to the position-momentum states explored during the final doubling iteration.

Ok, this makes sense and it seems consistent with what is described in Section 5 of @betanalpha’s conceptual introduction to HMC. But I would love some confirmation that this is indeed what we do for dynamic HMC and what is currently implemented in Stan.

yizhang · February 28, 2023, 10:37pm

adapt_delta is the target acceptance rate during warmup. It is the target the dual averaging algorithm tries to hit. The difference between historical acceptance rate and this target (similar to loss function. It’s an optimization process after all) is used to adjust step size. See Stan src here. delta_ in the code corresponds to adapt_delta. So yes, it’s in NUTS.

charlesm93 · March 1, 2023, 12:35am

Right, but how do you compute the current acceptance rate (options (i) - (iii) in my question)? That’s the question.

yizhang · March 1, 2023, 1:12am

it’s iii. See transition and build_tree function in base_nuts. But also see this thread: Improve step size adaptation target · Issue #3105 · stan-dev/stan · GitHub.

charlesm93 · March 1, 2023, 7:12pm

Thank you @yizhang for the answer. Lots of interesting discussion there.

betanalpha · March 7, 2023, 7:33pm

To add some more detail:

Dynamic Hamiltonian Monte Carlo, including NUTS and the current implementation in Stan, is not a Metropolis-Hastings method and so there is no well-defined notion of “acceptance probability”. On the other than the original “Hybrid Monte Carlo” implementation of Hamiltonian Monte Carlo is a Metropolis-Hastings method, using the terminal state of a numerical trajectory as a Metropolis proposal that is then accepted or rejected.

The existing analyses of the optimal integrator step size for Hamiltonian Monte Carlo is limited to this Hybrid Monte Carlo setting because it uses the number of Metropolis rejections as a measure of performance. Ultimately it derives bounds on the Metropolis acceptance probability that achieves optimal exploration per computational effort, and under nice enough conditions (no divergences, etc) the integrator step size can be tuned to achieve the the optimal acceptance probability.

In order to apply this result to the non-Metropolis-Hastings methods one has to come up with a way to approximate the more sophisticated Hamiltonian transitions as Metropolis transitions. The heuristic used in the original NUTS paper, and still used in Stan, considers a numerical trajectory as a collection of shorter trajectories between the initial state and each intermediate states. A transition from the initial state to an intermediate state can then be very roughly approximated by a Metropolis transition using the corresponding trajectory segment. Naively averaging over all of these possible Metropolis transitions using uniform weights results in (iii).

The problem with this heuristic is that the intermediate states, and hence corresponding proxy Metropolis transitions, are sampled uniformly. Instead they are sampled relative to the local value of the Hamiltonian function. Weighted the proxy Metropolis proposals in the same way better aligns the heuristic with the actual behavior of the sampler, resulting in an immediate improvement in performance.

Topic		Replies	Views
Adapt_delta Modeling	4	6926	March 7, 2023
Insights from Dynamic HMC animation Algorithms	2	977	September 16, 2020
NUTS vs HMC Algorithms	8	6856	August 4, 2020
Reference for current version of NUTS/HMC in stan? Algorithms	10	1752	March 15, 2022
HMC (jittered) vs. NUTS on 1000-dimensional standard normal Algorithms mcmc	9	3843	April 29, 2019

What acceptance probability does `adapt delta` target

Related topics