Accessible explanation to the No-U-Turn Sampler

adapt_delta just sets the target “acceptance rate” for the sampler. A higher target acceptance rate means adaptation will find lower step sizes. Once warmup’s done, these are locked in.

How adaptation works has changed over versions. But that target acceptance is now complicated as we’re not using the basic NUTS algorithm.

The main issue you run into is conditioning—the usual bugbear of any kind of gradient-based algorithm. If you get into a location in the posterior where the step size is too large, you get divergences. We only use gradient-based approximations (i.e., first order) of the real posterior curvature, so sometimes we need small step sizes to do that accurately.

1 Like