Setting Max Treedepth in difficult high-dimensional models

Increasing the tree depth should always improve effective sample size per iteration and hence artificially decreasing it is not recommended. Often what happens is that by increasing the tree depth threshold allows the sampler to explore more completely during adaptation (you can see if your adaptation was being limited by looking at the tree depth of the warmup iterations) which then affects the variance/metric element estimates and then the final adapted step size. In other words, you were missing some nasty valley in your posterior initially, adapted to only what you were seeing, and getting fast but biased exploration (often you’ll see divergences, but you may need to run longer for the chain to get close enough to the entrance of the valley for divergences to manifest). By increasing the tree depth threshold adaption is more accurate and you get the smaller step size that you need to accurately explore.

The fact that you’re seeing misfit is a good indication that your model is much harder to fit than you had initially thought and that the adaptation was indeed doing the right thing. It’s really, really hard for the adaption to end up super conservative (low step size, long trajectories) when it’s not needed.