Reference for current version of NUTS/HMC in stan?

Is there a complete and self-contained reference for the current implementation of Stan’s Monte Carlo inference engine? It is briefly discussed in the documentation which imply that it’s just Hoffman & Gelman’s original NUTS with (just one?) modification regarding the selection of the final location of the trajectory from Betancourt’s 2016 paper. (I admit that I find the latter quite hard to parse, albeit without having given it too much effort!)

But at least in this thread there does seem to be some discussion between @betanalpha, @Bob_Carpenter, and others about whether there are any additional differences from just-plain-NUTS which just cuts off without a definitive conclusion…

3 Likes

No. But I think we should.

4 Likes

The last time I asked around on this, I think we settled on the implementation being what is detailed in https://arxiv.org/pdf/1701.02434.pdf (specifically Appendix A.5) plus this PR: Feature/issue 2799 robust no u turn by betanalpha · Pull Request #2800 · stan-dev/stan · GitHub

I don’t think anything substantial has changed since.

2 Likes

@betanalpha

There is an opens doc issue to try to clarify some of this: Clarify that Stan uses non-standard/improved NUTS · Issue #436 · stan-dev/docs · GitHub

3 Likes

This is correct. To summarize the main changes from the original NUTS algorithm to the current Stan implementation are

  1. The termination criterion has been modified to a more geometrically formal termination criterion based on momenta and not positions.
  2. The termination criterion is checked not just at the boundaries of each binary subtree but also between subtrees. These additional checks help to avoid discretization issues that result in numerical trajectories that are too long.
  3. States are not sampled from numerical trajectories using slice sampling but rather direct multinomial sampling.
  4. The adaptation of the step size still uses dual averaging but the target has been modified based on rigorous theoretical calculations. The adaptation of the inverse metric has also been modified slightly.

I’m consider the diagnostics specific to Hamiltonian Monte Carlo as orthogonal to the algorithm structure.

There is an open pull request on improving the statistic used for the the step size adaptation, but that’s been languishing in pull request hell.

3 Likes

Which pr is that?

So you’re saying that fourth change isn’t part of the current Stan implementation?

I wasn’t going to advertise it on Discourse because IIRC back when that pull request was reverted someone expressed the opinion that maybe we shouldn’t discuss contentious issues on the same forum new users come to ask for help but a couple of days ago I posted a related question on GitHub so if you want to move on from “pull request hell” you could explain these rigorous theoretical calculations there.

Stan currently uses a more effective adaptation target than the original NUTS paper, and there are still better adaptation targets to consider.

I think these responses make it pretty clear that we do need some better documentation on the current algorithms!

(Is the current “more effective adaptation target” actually discussed somewhere?)

5 Likes