Subtree acceptance probablity in base_nuts.hpp:166

Bob_Carpenter · June 5, 2026, 7:40pm

Nope. You got it right.

The motivating factor for designing NUTS was trying to maximize expected squared jump distance. The original NUTS paper talks about this and it’s all implicit in their code.

We talk about it in much more detail in our GIST papers and I have what I think is a much cleaner C++ implementation in the Walnuts repository here:

github.com/flatironinstitute/walnuts

include/walnuts/walnuts.hpp

main

#pragma once

#include <cmath>
#include <cstdlib>
#include <functional>
#include <limits>
#include <optional>
#include <random>
#include <stdexcept>
#include <string>
#include <tuple>
#include <utility>

#include <Eigen/Dense>

#include <walnuts/concepts.hpp>
#include <walnuts/util.hpp>
#include <walnuts/validate.hpp>

namespace walnuts::detail {

This file has been truncated. show original

The expected square jump distance is proportional to (square, square root?) effective sample size. @andrewgelman said this to @matthewdhoffman about a bajillion times as Matt was developing NUTS and Matt’s solution to use “biased progressive sampling” (not Matt’s name for it) was brilliant. It does have one drawback, though—it tends to boost the effective sample size of parameter estimation, but slightly deflate variance effective sample size. I did a post on this ages ago that (I hope) helps visualize this issue with a 1000-dimensional standard normal target:

Stan forums (i.e., here!): HMC (jittered) vs. NUTS on 1000-dimensional standard normal

One thing that’s apparent here is that some step sizes lead to strongly anti-correlated draws, and when you go too far with that, the variance estimates are terrible. So we can’t just maximize expected square jump distance—we need to balance it, or as Chris Sherlock pointed out in his discussion of his Apogee-to-Apogee sampler, we can try to maximize expected square jump distance of our parameters squared rather than our parameters.

There’s more analysis of this in our Walnuts paper, specifically figure 3 which shows the result of biased progressive sampling on expected path length.

Topic		Replies	Views
Multinomial sampling when joining a new tree in dynamic NUTS implementation Algorithms	5	904	December 7, 2018
NUTS vs HMC Algorithms	8	7378	August 4, 2020
Walking through NUTS code Developers	1	652	December 20, 2016
NUTS misses U-turns, runs in circles until max_treedepth Algorithms	66	6000	August 31, 2019
Confused about accept_stat__ and delta Algorithms mcmc	10	2340	August 21, 2019

Subtree acceptance probablity in base_nuts.hpp:166

Related topics