Blog: Taming Divergences in Stan Models


#1

So I am a bit nervous about this one. I tried to write a post I would have liked to read early in my love affair with Stan. I am compiling the strategies I have used to handle divergences and my understanding of what divergences are. But I am no expert yet and my calculus is rusty, so I am a bit out of my depth and I hope I didn’t write anything stupid. The post is aimed at people whose calculus is also rusty or with little exposure to calculus at all. All corrections welcome.

http://www.martinmodrak.cz/2018/02/19/taming-divergences-in-stan-models/


#2

Divergences are not discussed in the original NUTS paper. You’ll likely want to reference instead https://arxiv.org/abs/1701.02434 which has an extensive discussion of divergences and how they relate to the stability of numerical integrators and the geometry of the target distribution.


#3

Good point, reference added, thanks.


#4

This is an awesome overview. I wish it was around when I started struggling with divergences. One more piece of advice (but that would ruin the nice list of 10) would be to check for coding errors. From memory:

  • Leaving an unused unbounded parameter lingering around without a prior can lead to divergences.
  • Getting too early out of a loop: (for n in 1:K) instead of (for n in 1:N) with K < N leads to the same issue as above.
  • Stupid coding and algebra errors like a multiplication instead of addition, square root instead of square, exp instead of log, missing minus sign, can lead to numerical problems (overflow or Nan) and introduce identification problems.

I have spent some time looking for a statistical problem which was actually a coding/algebra problem.


#5

That has very much been my experience as well, but it didn’t occur to me to include it. Putting this as a proud #1 :-) - Thanks!


#6

Correct if I’m wrong, but you didn’t link to this earlier case study, https://betanalpha.github.io/assets/case_studies/divergences_and_bias.html, which discusses divergences in the context of hierarchical models and how to use their spatial distribution to investigate problems. Might also be a useful reference.


#7

I linked to it as an example for non-centered parametrization. I slightly changed the title of the link to make sure it advertises its content properly.


#8

Ah. Might be worth referencing the link in the context of (5) as well (7) as it demonstrates the use of pair plots to identify the source of divergences.


#9

Nice to see lots of accessible information about divergent transitions collected in one place!
Given that you were asking for feedback: I would put the simulation of data as one of the first three points (not #9). For me, having to simulate data nearly always leads to a better understanding of “real” data and helps to formulate better a better Stan model.