Subtree acceptance probablity in base_nuts.hpp:166

Nope. You got it right.

The motivating factor for designing NUTS was trying to maximize expected squared jump distance. The original NUTS paper talks about this and it’s all implicit in their code.

We talk about it in much more detail in our GIST papers and I have what I think is a much cleaner C++ implementation in the Walnuts repository here:

The expected square jump distance is proportional to (square, square root?) effective sample size. @andrewgelman said this to @matthewdhoffman about a bajillion times as Matt was developing NUTS and Matt’s solution to use “biased progressive sampling” (not Matt’s name for it) was brilliant. It does have one drawback, though—it tends to boost the effective sample size of parameter estimation, but slightly deflate variance effective sample size. I did a post on this ages ago that (I hope) helps visualize this issue with a 1000-dimensional standard normal target:

One thing that’s apparent here is that some step sizes lead to strongly anti-correlated draws, and when you go too far with that, the variance estimates are terrible. So we can’t just maximize expected square jump distance—we need to balance it, or as Chris Sherlock pointed out in his discussion of his Apogee-to-Apogee sampler, we can try to maximize expected square jump distance of our parameters squared rather than our parameters.

There’s more analysis of this in our Walnuts paper, specifically figure 3 which shows the result of biased progressive sampling on expected path length.