So I am a bit nervous about this one. I tried to write a post I would have liked to read early in my love affair with Stan. I am compiling the strategies I have used to handle divergences and my understanding of what divergences are. But I am no expert yet and my calculus is rusty, so I am a bit out of my depth and I hope I didn’t write anything stupid. The post is aimed at people whose calculus is also rusty or with little exposure to calculus at all. All corrections welcome.
Divergences are not discussed in the original NUTS paper. You’ll likely want to reference instead https://arxiv.org/abs/1701.02434 which has an extensive discussion of divergences and how they relate to the stability of numerical integrators and the geometry of the target distribution.
This is an awesome overview. I wish it was around when I started struggling with divergences. One more piece of advice (but that would ruin the nice list of 10) would be to check for coding errors. From memory:
Leaving an unused unbounded parameter lingering around without a prior can lead to divergences.
Getting too early out of a loop: (for n in 1:K) instead of (for n in 1:N) with K < N leads to the same issue as above.
Stupid coding and algebra errors like a multiplication instead of addition, square root instead of square, exp instead of log, missing minus sign, can lead to numerical problems (overflow or Nan) and introduce identification problems.
I have spent some time looking for a statistical problem which was actually a coding/algebra problem.
Correct if I’m wrong, but you didn’t link to this earlier case study, https://betanalpha.github.io/assets/case_studies/divergences_and_bias.html, which discusses divergences in the context of hierarchical models and how to use their spatial distribution to investigate problems. Might also be a useful reference.
I linked to it as an example for non-centered parametrization. I slightly changed the title of the link to make sure it advertises its content properly.
Ah. Might be worth referencing the link in the context of (5) as well (7) as it demonstrates the use of pair plots to identify the source of divergences.
Nice to see lots of accessible information about divergent transitions collected in one place!
Given that you were asking for feedback: I would put the simulation of data as one of the first three points (not #9). For me, having to simulate data nearly always leads to a better understanding of “real” data and helps to formulate better a better Stan model.