This is an experiment in using Discourse topics for documentation as discussed at Discourse - issue/question triage and need for a FAQ the content has yet to receive feedback and tweaks from the broader community. Also, this is a wiki post, so everyone except the very new users of the forum can edit this topic - feel free to improve this. The goal of this topic is to build a brief overview of the main points and links to other resources, not a complete treatment of the topic.
Divergent transitions are a signal that there is some sort of degeneracy; along with high Rhat/low n_eff and “max treedepth exceeded” they are the basic tools for diagnosing problems with the model. Divergences almost always signal a problem and even a small number of divergences cannot be safely ignored.
What is a divergent transition?
For some intuition, imagine walking down a steep mountain. If you take too big of a step you will fall, but if you can take very tiny steps you might be able to make your way down the mountain, albeit very slowly. The mountain here is our posterior distribution. A divergent transition signals that Stan was unable to find a step size that would be big enough to actually explore the posterior while still being small enough to not fall. The problem is usually with somehow “uneven” or “degenerate” geometry of the posterior.
- Identity Crisis - a rigorous treatment on the causes of divergences, diagnosis and treatment.
- Taming divergences in Stan models less rigorous, but hopefully more accessible intuition on what divergent transitions are.
- Divergent transitions in Stan reference manual
- A Conceptual Introduction to Hamiltonian Monte Carlo
Strategies to diagnose and resolve divergences
Check your code. Divergences are almost as likely a result of a programming error as they are a truly statistical issue. Do all parameters have a prior? Do your array indices and for loops match?
Create a simulated dataset with known true values of all parameters. It is useful for so many things (including checking for coding errors). If the errors disappear on simulated data, your model may be a bad fit for the actual observed data.
Reduce your model. Find the smallest / least complex model and a (preferrably simulated) dataset that shows problems. Only add more complexity after you resolve all the issues with the small model. If your model has multiple components (e.g. say a linear predictor for parameters in an ODE model), build and test small models where each of the components is separate (e.g. a separate linear model and separate ODE model with constant parameters).
Make sure your model is identifiable - non-identifiability (i.e. parameters are not well informed by data, large changes in parameters can result in almost the same posterior density) and/or multimodality (i.e. multiple local maxima of the posterior distributions) cause problems. Further reading:
- Case study - mixture models
- Identifying non-identifiability - some informal intuition of the concept and examples of problematic models and how to spot them.
- Underdetermined linear regression discusses problems arising when the data cannot inform all parameters.
- Interpretation of cor term from multivariate animal models has an example where a varying intercept at individual-level is not identified.
Check your priors. If the model is sampling heavily in the very tails of your priors or on the boundaries of parameter constraints, this is a bad sign.
Avoid overly wide prior distributions, unless really large values of the parameters are plausible. Especially when working on the logarithmic scale (e.g. logistic/Poisson regression) even seemingly narrow priors like
normal(0, 1);can be actually quite wide (this makes an odds ratio/multiplicative effect of
7.4still a-priori plausible).
If you have additional knowledge that would let you defensibly constrain your priors use it. Identity Crisis has some discussion of when this can help. However, be careful to not use tighter priors than you can actually justify from background knowledge.
Reparametrize your model to make your parameters independent (uncorrelated), constrained by the data and close to N(0,1) (a.k.a change the actual parameters and compute your parameters of interest in the
transformed parametersblock).Further reading:
- Case study - diagnosing a multilevel model discusses non-centered parametrization which is frequently useful.
- Betancourt & Girolami 2015 - more formal treatment of non-centered parametrization
- Identifying non-identifiability - a sigmoid model shows an example of where the parameters are not well informed by data, while Difficulties with logistic population growth model show a potential reparametrization.
- Reparametrizing the Sigmoid Model of Gene Regulation shows problems and solutions in an ODE model.
Move parameters to the
datablock and set them to their true values (from simulated data). Then return them one by one to
parametersblock. Which parameter introduces the problems?
Introduce tight priors centered at true parameter values. How tight need the priors to be to let the model fit? Useful for identifying multimodality.
Run Stan with the
test_gradoption - can detect some numerical instabilities in your model.
Play a bit more with
max_treedepth; see here for an example. Note that increasing
adapt_deltain particular has become quite common as the go-to first thing people try, and while there are cases where it becomes necessary to increase
adapt_deltafor an otherwise well-behaving model, increases absent the more rigorous exploration options above can hide pathologies that may impair accurate sampling. Furthermore, increasing
adapt_deltawill certainly slow down sampling performance. You are more likely to achieve both better sampling performance and a more robust model (not to mention understanding thereof) by pursing the above options and leaving adjustment of
adapt_deltaas a last-resort. Increasing
adapt_deltabeyond 0.99 and
max_treedepthbeyond 12 is seldom useful. Also note that for the purpose of diagnosis, it is actually better to have more divergences, so reverting to default settings for diagnosis is recommended.
If you fail to diagnose/resolve the problem yourself or if you have trouble understanding or executing some of the strategies outlined above, you are welcome to ask here on Discourse, we’ll try to help!