Sampler Diagonstics

When encountering divergences, I often have the problem that these divergences are spread out all over the posterior distribution (or at least the part of it that was already sampled). This makes it difficult to pin down the cause of these divergences. Only in the minority of cases I encountered divergences which are thightly grouped in a certain area, thus giving a hint about what causes them.

I think that these divergences are so often spread out is because of two things:

  1. The location where the divergence is shown is not where the energy error occurred, but the point from which the sample started moving before hitting the divergence-causing region.
  2. Because HMC is very good at traversing the posterior, the start of the path can be somewhere completely different than where the problem occurred.

Thus, I imagine that it would be highly useful if we could get more information about the divergent transition, such as the coordinates of the leapfrog step with the largest energy error, or just simply the exact path the sampler took.

In some cases there is a region that causes the sampler to make its stepsize very small, leading to long runtimes and/or treedepth warnings, but without causing divergences. In this case diagnosing divergences doesn’t help to find the problem. Therefore it would be helpful if we could also get info about the locations and energy errors of the individual leapfrog steps for those transitions which were not divergent, to check which parts of the posterior are most difficult for the sampler.

I feel these things would greatly help me when debugging/optimizing my models.
Assuming that other people are sharing my experience, especially newcomers to stan, this could also significantly lower the entry barrier for new stan users.

I think I once saw an example of a HMC-implementation that actually had the feature of showing the sampler path of divergent transitions. Unfortunately I couldn’t find the article about it. I vaguely remember it beeing about pymc3.

1 Like

Yeah, I agree with this. I’ve certainly been disappointed after making plots of divergences. Maybe there is something more to be had.

I liked the plots @martinmodrak made here: Getting the location + gradients of divergences (not of iteration starting points) . I think he was getting at the same idea.

@betanalpha makes the point in that thread that divergence-ness is a property of the whole trajectory, not just a point on that trajectory.

I think there were IO issues with getting stuff out of actual Stan. Martin managed it somehow.

My buddy and I have an R version of the NUTS algorithm Stan uses that might make it easier to test this (the algorithm is in R but uses Stan gradients). I dunno. It’ll be slow, for sure, but maybe it’s useful at all: https://github.com/bbbales2/RNUTS/blob/master/rnuts_example.R . A little more info on using it here: NUTS misses U-turns, runs in circles until max_treedepth. If you’re interested I can tell you more.

3 Likes

That’s a fantastic thread over there! They’ve discussed a lot of aspects there already. I’ll read thought it all and then post my anwswer in that old thread.

That sounds interesting, thanks! I’ll keep it in Mind, but expect that I won’t have the time to do something useful with it.
May I suggest that you post this to the other thread which you linked above, too? Then this thread here can be closed and we have everything in one place.

2 Likes

Eh, keep the new thread. This thread links to it and it’s confusing to go back and try to restart these old monolithic topics. Some of that info on the code might be dated anyway.

Well, if you do feel free to report back how it goes. Plots are good :D. And if the results are confusing and not clear, that’s interesting too.

1 Like