The role of "max_treedepth" in No-U-Turn?

The dynamic Hamiltonian Monte Carlo sampler used in Stan is not the No-U-Turn sampler. Trajectories are still built from binary trees but just about everything else has changed.

Saturating the maximum tree depth is not related to divergent trajectories. Most of the advice that follows unfortunately doesn’t apply.

Hamiltonian Monte Carlo explores a given probability distribution with deterministic trajectories that are able to span large regions of your model configuration (i.e. parameter) space. In practice we can’t construct these trajectories exactly, but we can approximate them with numerical trajectories that consist of a discrete sequence of points. The accuracy of this approximation is determined by a step size. The smaller the step size the more points in the numerical trajectory but the closer the numerical trajectory is to the exact continuous trajectory. The larger the step size the fewer points in the numerical trajectory but the more inaccurate it will be.

The performance of Hamiltonian Monte Carlo depends on the size of these trajectories. If the exact trajectory is too short then the sampler explores only slowly, and if the trajectory is too long then it can return to where it started and waste time on redundant exploration.
The more complex your target probability distribution the longer the exact trajectories will need to be to explore, and the smaller the step size will need to be so that our numerical approximations of those trajectories are sufficiently accurate. In other words the more complex your target probability distribution the more discrete steps you’ll need in each trajectory.

Stan uses a dynamic Hamiltonian Monte Carlo sampler that automatically determines how long these trajectories should be based on the behavior of your target probability distribution. If your target distribution is ill-defined and stretches all the way to infinity, however, then Stan would actually try to build an infinitely long trajectory and never stop. To avoid this there is a maximum trajectory length equal to 2^{\text{max_treedepth}}, which equals 1024 steps for the default max_treedepth = 10, for safety.

If you’re saturating that maximum trajectory size then your target probability distribution is probably highly degenerate, and may even be non-identifiable. For some more discussion on these two terms and strategies for responding to maximum tree depth warnings see for example Identity Crisis.

There are an infinite number of ways that your target probability distribution could be complicated that could lead to this warning and so there’s no immediate fix. Instead you’ll have to investigate your target distribution as discussed in that link to identify what the problem is and then react to that particular problem.

8 Likes