Recommended process for diagnosing high tree depth?

There are potentially multiple causes for high tree depths:

  • Highly correlated parameters
  • Fat tails

Is there a recommended process for isolating whether a small number of parameters are causing this?

Eg, for the second of these, you might be able to look at the sample kurtosis on the unconstrained parameters. You would want to try to identify parameters with high kurtosis.

Thank you!

Another reason for high tree depth is varying curvature in the posterior, which means that we have to choose the lowest reasonable step size, which then takes a long time in flatter parts of the posterior. This typically gets introduced through hierarchical priors that have not been reparameterized to use the non-centered parameterization.

I’d suggest starting with simple models and building up to where there starts to be a problem. Then you’ll know where the problem gets introduced. Barring that, you can look at the ESS results for individual parameters. The ones that mix poorly will be low. But that’s not necessarily related to high tree depth. You can also look at pairs plots for pairs of parameters to see if there are problems like banana shapes (multiplicative non-identifiability), non-axis aligned cigar shapes (correlation causing problems), or funnel shapes (hierarchical models with centered parameterizations and not much data, typically).

If the problem is high correlation, that’s often a problem with identifiability. For that, I’d recommend @betanalpha’s case study:

https://betanalpha.github.io/assets/case_studies/identifiability.html

This is the most relevant figure from that case study:

The optimal integration time \tau is determined by the longest length scale of a posterior density function while the step size that ensures stable numerical integration is determined by the shortest length scale. The total number of leapfrog steps for optimal numerical Hamiltonian trajectories is then given by

L = \frac{ \tau }{ \epsilon }.

Saturating the maximum tree depth, i.e. the maximum L, can then be caused either by \tau being too large, \epsilon being too small, or a combination of the two.

Small \epsilon can be caused by numerical instabilities in the model (for example regions of high-curvature) or even inaccurate gradient calculations (for example from an ODE solver whose tolerances have are too large). The latter will typically be pegged with divergence warnings, while the latter manifests as the accept_stat distribution concentrating far below the target adapt_delta which defaults to 0.8. If I see divergences accompanying treedepth warnings then I go hunting for regions of high curvature and if I see odd accept_stat behavior, or if the model uses integrate_ode or integrate_1d or algebraic_solve or even the reject function, then I will investigate any numerical tolerance problems.

If maximum treedepth warnings arise by themselves without any other co-occuring warnings then it’s probably a nasty degeneracy. One way to isolate potentially relevant parameters is to compare behavior before and after increasing max_treedepth from its default value of 10. If you see the breadth of some parameters increase substantially then they are likely involved in the degeneracy.
That said most degeneracies tend to involve multiple, if not all, of the parameters and there’s no way to disentangle the parameters involved without exploiting the structure of the particular model. This isn’t easy but it’s typically the most productive way to try to investigate high-dimensional degeneracies.