The role of "max_treedepth" in No-U-Turn?

betanalpha · August 30, 2021, 5:53pm

The dynamic Hamiltonian Monte Carlo sampler used in Stan is not the No-U-Turn sampler. Trajectories are still built from binary trees but just about everything else has changed.

Saturating the maximum tree depth is not related to divergent trajectories. Most of the advice that follows unfortunately doesn’t apply.

Hamiltonian Monte Carlo explores a given probability distribution with deterministic trajectories that are able to span large regions of your model configuration (i.e. parameter) space. In practice we can’t construct these trajectories exactly, but we can approximate them with numerical trajectories that consist of a discrete sequence of points. The accuracy of this approximation is determined by a step size. The smaller the step size the more points in the numerical trajectory but the closer the numerical trajectory is to the exact continuous trajectory. The larger the step size the fewer points in the numerical trajectory but the more inaccurate it will be.

The performance of Hamiltonian Monte Carlo depends on the size of these trajectories. If the exact trajectory is too short then the sampler explores only slowly, and if the trajectory is too long then it can return to where it started and waste time on redundant exploration.
The more complex your target probability distribution the longer the exact trajectories will need to be to explore, and the smaller the step size will need to be so that our numerical approximations of those trajectories are sufficiently accurate. In other words the more complex your target probability distribution the more discrete steps you’ll need in each trajectory.

Stan uses a dynamic Hamiltonian Monte Carlo sampler that automatically determines how long these trajectories should be based on the behavior of your target probability distribution. If your target distribution is ill-defined and stretches all the way to infinity, however, then Stan would actually try to build an infinitely long trajectory and never stop. To avoid this there is a maximum trajectory length equal to 2^{\text{max_treedepth}}, which equals 1024 steps for the default max_treedepth = 10, for safety.

If you’re saturating that maximum trajectory size then your target probability distribution is probably highly degenerate, and may even be non-identifiable. For some more discussion on these two terms and strategies for responding to maximum tree depth warnings see for example Identity Crisis.

There are an infinite number of ways that your target probability distribution could be complicated that could lead to this warning and so there’s no immediate fix. Instead you’ll have to investigate your target distribution as discussed in that link to identify what the problem is and then react to that particular problem.

Topic		Replies	Views
Divergence and treedepth issues in multilevel threshold autoregressive model estimation Modeling	2	370	April 10, 2023
Non-centered parameterization for a multinormal measurement error model Modeling hierarchical-model , max_treedepth , reparametrization	1	523	April 11, 2021
Parameter scaling and hitting the maximum tree depth Modeling rstan , fitting-issues	2	508	March 8, 2023
Model gets stuck - but not quite always Modeling	7	1333	July 18, 2018
Consultation regarding max_treedepth, and poor mixing of some chains in a hierarchical computational learning mode Modeling	3	673	July 17, 2018

The role of "max_treedepth" in No-U-Turn?

Related Topics