Best Practice: Max Treedepth During Warmup

mrinank_sharma · February 9, 2021, 3:58pm

Hi everybody,

I’ve been fitting a large model using NUTS. I find that the maximum treedepth is often exceeded during the warmup stage, for example, reaching treedepths of 17 and 18. Obviously, this makes running the model exceedingly slow.

Surprisingly, I notice the model doesn’t need a large treedepth when sampling - the treedepth never exceeds 11 during sampling. This suggests to me that the posterior geometry is ‘well behaved’, but it seems that NUTS is having some trouble adapting.

What would be best practice in this case? Perhaps limiting the treedepth to a reasonable during warmup, but not during sampling? Though, this could lead to worse adaptation. I’m not sure what reparameterisations could be helpful for this case.

Cheers :)

Ara_Winter · February 9, 2021, 7:01pm

Hi and welcome. A couple of things to will help folks troubleshoot your problem here:
Can you post the model? And the model call to run it?
Can you share a snippet of the data (for running)? Or fake data for folks to play with?
Is this in R? Python? and if so what versions?
Does the model run with fake data and known parameters? And can you recover those parameters?

thanks!

betanalpha · February 18, 2021, 12:00am

If the sampling behavior after the warmup phase is fine then there are two possible problems. The first is that the geometry outside of the target typical set is nasty and Stan’s dynamic Hamiltonian Monte Carlo sampler has to really work with long trajectories to get through that nastiness and find the target typical set. The other is that the default tuning of the step size and inverse metric elements are poor and the early exploration of the target typical set that informs the adaptation is necessarily slow.

What does the distribution of inverse metric elements look like? If there are strong variations then it would help to rescale your parameters so that the posterior lengthscales are more uniform and the initial tuning is less bad.

mrinank_sharma · February 21, 2021, 3:30pm

Hi Mike,

Thanks for your response.

In my case, I found when looking at the elements of the (diagonal) inverse mass matrix, the values spanned several orders of magnitude. Rescaling parameters with mass matrix values far from the common values (e.g., multiply and dividing by constants) significantly improved the adaptation and reduced the treedepths required during warmup.

Topic		Replies	Views
The role of "max_treedepth" in No-U-Turn? General fitting-issues	9	5760	September 6, 2021
NUTS misses U-turns, runs in circles until max_treedepth Algorithms	66	5417	August 31, 2019
Is it safe to ignore max_treedepth warnings if other diagnostics (ESS, Rhat, loo) are acceptable? Modeling	3	710	August 23, 2021
Setting Max Treedepth in difficult high-dimensional models Modeling	13	4124	August 6, 2017
Max_treedepth saturated, but increasing it slows sampling to a standstill Modeling fitting-issues , specification , performance	12	1924	December 10, 2020

Best Practice: Max Treedepth During Warmup

Related topics