Accessible explanation to the No-U-Turn Sampler


#1

In my Bayesian seminar today, we discussed at length how step-size and adapt-delta change the way we explore and sample from the posterior distribution. We were looking at the Hoffman & Gelman (2014) paper, but I’m wondering if there is a more intuitive or accessible explanation of what these hyperparameters do, how they affect how we explore the posterior, what the consequences of doing this is, and the thinking behind it was?

Does anyone know if a blog post or journal article or explanation elsewhere that explains the NUTS in a little more broader, conceptual terms?


Mathematical computation behind the scene
#2

https://arxiv.org/abs/1701.02434


#3

Michael’s conceptual intro is great. I also tried to explain it to non-mathy types in this paper:

I think it’s a nice complement to the other literature, but naturally that’s a biased opinion.


#4

This is perfect, thanks!


#5

This blog post by Richard McElreath is also a very good way to start imo.


#6

Ah, this is great! This is a perfect first introduction to the sampler, with nice interactives that one can use in a seminar.


#7

adapt_delta just sets the target “acceptance rate” for the sampler. A higher target acceptance rate means adaptation will find lower step sizes. Once warmup’s done, these are locked in.

How adaptation works has changed over versions. But that target acceptance is now complicated as we’re not using the basic NUTS algorithm.

The main issue you run into is conditioning—the usual bugbear of any kind of gradient-based algorithm. If you get into a location in the posterior where the step size is too large, you get divergences. We only use gradient-based approximations (i.e., first order) of the real posterior curvature, so sometimes we need small step sizes to do that accurately.