Getting the location + gradients of divergences (not of iteration starting points)

Thanks for the feedback. The point that it has been tried before with negative results is an important one. Could you provide more details/thoughts on why terminal points of divergences (or other diagnostics you’ve tried) didn’t provide useful additional information and/or for which kind of models?

However, the current state of affairs is that Stan summarizes the whole trajectory with a single point that is chosen to maintain properties of the Markov chain, not to be informative. I find it hard to believe that it would be difficult to improve on that (e.g. by summarizing the trajectory by two points). I am not searching for a universal diagnostic - just a better heuristic than what we have now. I am open to be convinced by your experience that this is not possible, but the arguments put forth so far are IMHO not very strong.

I however understand that there are costs to adding diagnostics, and I guess you are arguing more on the line that more diagnostics are not worth the cost. I see three ways new diagnostics can be costly: maintenance, performance and developer time. I think all of those costs can be kept low while the potential gain is non-trivial, but I am open to be proven wrong.

Maintenance: The diagnostic code has to be kept up to date, including the plumbing to interfaces. @sakrejda helpfully pointed out that the main bottleneck to adding diagnostics is the output plumbing from Stan which was already in need of a refactor (see this thread: Proposal for consolidated output). If this works as intended, adding new stream of data could really be just few lines of code in both Stan core and the interfaces as all the serialization and labelling would be done once. This should also keep the maintenance cost low.

Performance: This is a serious one. You correctly point out that some runtime decisions would have to be made as requiring recompilation of model when different diagnostics are desired is not very user-friendly. I don’t have enough understanding of C++ internals to be able to guess if that might affect compiler optimizations (and I guess this would be model-dependent anyway). We can however make a simple static switch between “standard output” and “advanced diagnostics” - both variants of the model could probably be compiled at the same time with little additional compilation effort.

Developer time: I currently have some time to work on refactoring the output and then testing additional diagnostics as this issue happens to be close to my heart. There is however non-negligible cost on the core team as well - some supervision will be required. I understand if you believe this is a commitment that the team should not make at this time. You may also believe that the Stan community will better benefit from me investing time in something else - if so, do you have some suggestions?

Going forward, I see at least two ways to gather more information than just the sample + end of divergent transition, if that proves to not be useful enough:

  1. Saving all steps is problematic, however, if we output RNG state and unconstrained params (and maybe a few other bits) per iteration, it should be possible to rerun an iteration on demand (e.g. only for divergent and possibly only for the ones of interest). This would however not be useful for large models where a single iteration takes a long time (but storing everything is not an option for such models anyways).

  2. We could also dump points where the difference in the Hamiltonian first exceeds a certain value (e.g. 1/4 and 1/2 of the threshold for divergence). This would be cheap and have constant overhead in output size.

2 Likes