The role of "max_treedepth" in No-U-Turn?

Stan uses the No-U-Turn-Sampler (NUTS) described in [1111.4246] The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.

NUTS builds a binary tree by taking forward/backwards “directional” steps to explore the target posterior distribution guided towards the highest probability density regions by the gradient of the log-posterior distribution.

The max_treedepth parameter tells Stan the max value, in exponents of 2, of what the binary tree size in the NUTS algorithm should have. The default is 10 which implies that Stan should build a maximum of 2^{10} = 1024 nodes. This means that the algorithm would build a binary tree with upmost size (or “height”) of 10.

Now, how to set a max_treedepth value one should heed for two sources of advises:

  1. Taming Divergences in Stan Models by @martinmodrak
  2. Divergent transitions - a primer also by @martinmodrak

I have a personal note in my computer that unfortunately I don’t know what is original and what is copied from someone else. So if I am not giving credit, let me know, so I can fix this.


A divergence arises when the simulated Hamiltonian trajectory departs from the true trajectory as measured by departure of the Hamiltonian value from its initial value.

  1. Check your code. Twice . Divergences are almost as likely a result of a programming error as they are a truly statistical issue. Do all parameters have a prior? Do your array indices and for loops match?
  2. Create a simulated dataset with known true values of all parameters . It is useful for so many things (including checking for coding errors). If the errors disappear on simulated data, your model may be a bad fit for the actual observed data.
  3. Check your priors . If the model is sampling heavily in the very tails of your priors or on the boundaries of parameter constraints, this is a bad sign.
  4. Visualisations : use mcmc_parcoord from the [bayesplot](<https://cran.r-project.org/web/packages/bayesplot/index.html>) package, Shinystan and pairs from rstan . Documentation for Stan Warnings (contains a few hints), Case study - diagnosing a multilevel model, Gabry et al. 2017 - Visualization in Bayesian workflow
  5. Make sure your model is identifiable - non-identifiability and/or multimodality (multiple local maxima of the posterior distributions) is a problem. Case study - mixture models, my post on non-identifiable models and how to spot them.
  6. Run Stan with the test_grad option.
  7. Reparametrize your model to make your parameters independent (uncorrelated) and close to N(0,1) (a.k.a change the actual parameters and compute your parameters of interest in the transformed parameters block).
  8. Try non-centered parametrization - this is a special case of reparametrization that is so frequently useful that it deserves its own bullet. Case study - diagnosing a multilevel model, Betancourt & Girolami 2015
  9. Move parameters to the data block and set them to their true values (from simulated data). Then return them one by one to parameters block. Which parameter introduces the problems?
  10. Introduce tight priors centered at true parameter values . How tight need the priors to be to let the model fit? Useful for identifying multimodality.
  11. Play a bit more with adapt_delta , stepsize and max_treedepth . Example
8 Likes