The hierarchical continuous time state space models I’ve been working on with the ctsem R package often return some divergent transitions when estimating with stan, but generally when I avoid estimating the initial means and covariances of the processes (which may not be very well defined by the data), there are no problems. So, given that parameter estimation generally seems to be working well, I’m suggesting the ‘try to eliminate but can also ignore’ approach to dealing with divergences. I am wondering though, whether there is any chance of relating the divergences to particular parameters in general - perhaps with parameter dependencies it is a bit of a fruitless idea, but if it could be done, knowing whether key parameters or relatively unimportant parameters are causing / affected by the divergence, would be pleasant. Apologies in advance for probably completely misunderstanding divergences!
I think you’re on the right track trying to track down what parameterizations lead to divergences. Have you seen this case study by Michael Betancourt: http://mc-stan.org/documentation/case-studies/divergences_and_bias.html ? There’s also this thread which is asking a similar question: https://groups.google.com/forum/#!topic/stan-users/RbVvvPW1BY8 .
If you have a fit object, you can pass it to ShinyStan and look through the pairplots in there. That highlights divergent transitions for you as well.
From the sound of it (“estimating the initial means and covariances of the processes (which may not be very well defined by the data)”) you have a pretty good idea of where your problems are, so probably worth starting with those parameters.
I guess it was more a request / suggestion / general wondering, about whether there might be a way to include such information in the warning or output, as from what I’ve seen (myself and others) end up ignoring the warnings, because whatever model we’ve specified seems to be ‘working’ (returning seemingly sensible parameter estimates under test conditions) in general.
Well if you think you know better and can validate that your model is ‘working’ then go ahead and ignore the warnings. We just no longer offer any promises that the results you are getting are reasonable.
We cannot offer general guidance as to the source of divergences because the modeling language is too rich. There are a near infinite number of possible pathologies and there is no way that we can try to identify them automatically. The fact that we have a general diagnostic capable of identifying issues at all is frankly amazing and sets Hamiltonian Monte Carlo apart from almost all other statistical algorithms.
Hence we are left with our recommended workflow, https://github.com/stan-dev/stan/wiki/Stan-Best-Practices. With great modeling power comes great modeling responsibility and what not.
You can investigate divergences in ShinyStan (even if you didn’t fit in RStan). It lets you visualize one, two, or three parameters at a time and highlights the divergent transitions in the scatterplots.
The divergences are because the Hamiltonian simulation fails numerically—it doesn’t just affect one parameter, it terminates the iterations and you can be left with a system that both mixes poorly (because you can no longer follow the Hamiltonian, which is where the good mixing comes from) and is biased if the divergences aren’t random. Typically we see that in the neck of a funnel in a hierarchical model, for instance, when we can’t step the step size low enough to follow the Hamiltonian and still make progress on the the rest of the funnel.
Oh, and if you want to verify you’re getting the right results, here’s a good way to do it:
http://andrewgelman.com/2017/04/12/bayesian-posteriors-calibrated/