Divergence vs hitting max_treedepth

tlyim · June 25, 2019, 3:20pm

Context: I am trying to optimize on the lengths of chains (warmup and iter) and other tuning parameters (window, term_buffer, max_treedepth, metric=dense_e vs. diag_e, adapt_delta) to minimize the run time while keeping at acceptable levels the N_Eff (~400+) and Rhat (<= 1.01). I have some hard-to-estimate parameters (eg, w[1] and p[3] in these traceplots: dense54K,600,200,11,0.98,75,50,35.pdf (98.3 KB) ) and need a large amount of simulated data points (5,400 to 9,000) to get reasonable estimates of the true parameters. To keep the run time to a couple to several hours, I am focusing on warmup=400 to 600 and iter=200 (x 4 chains = a total of 2400+ iterations). I notice that

in all those instances with a smaller number of divergences/hitting the max_treedepth that occurred in my comparison, they give very similar mean estimates.
a specification of turning parameters that works well (no warning) for a 9K data sample does not necessarily work for a 5.4K sample (eg, a couple of divergences/hitting the max_treedepth or more or even both)
sometimes the run time can be shorter for a 9K sample than a 5.4K one because, I guess, a larger sample can give more information that helps pinning down the hard-to-estimate parameters faster.

Questions:

I know there have been related discussions (on divergences and max_treedepth) but would appreciate a bit more on whether 1 or 2 divergences/hitting the max_treedepth is accceptable under certain conditions.
- even this post by @betanalpha mentions that “1 of 10000 iterations ended with a divergence (0.01%) … which is indicative of the divergences being false positives.”
If two specifications giving 1 divergence in the former and 1 instance of hitting the max_treedepth in the latter, is it fair to say that the latter is likely to be less biased than the former (1 divergenece)?

Would appreciate very much your feedback. Thanks.

bgoodri · June 25, 2019, 4:06pm

A few divergences are much worse than hitting the maximum treedepth a few times.

Topic		Replies	Views
How to determine the combination of adapt_delta / max_treedepth / metric=dense_e vs. diag_e / term_buffer / window / warmup / iter? Modeling	2	1607	May 8, 2019
Output divergences/max_treedepth hit during the run? CmdStan	14	956	February 25, 2022
Stepsize & Treedepth for Constrained Parameters Modeling	3	471	June 14, 2022
Max_treedepth saturated, but increasing it slows sampling to a standstill Modeling fitting-issues , specification , performance	12	1927	December 10, 2020
Levers to pull to increase stepsize so as to avoid saturating the treedepth Modeling fitting-issues , performance	2	649	February 18, 2021

Divergence vs hitting max_treedepth

Related topics