Status: Draft
This is an experiment in using Discourse topics for documentation as discussed at Discourse  issue/question triage and need for a FAQ the content has yet to receive feedback and tweaks from the broader community. Also, this is a wiki post, so everyone except the very new users of the forum can edit this topic  feel free to improve this. The goal of this topic is to build a brief overview of the main points and links to other resources, not a complete treatment of the topic.
Divergent transitions are a signal that there is some sort of degeneracy; along with high Rhat/low n_eff and â€śmax treedepth exceededâ€ť they are the basic tools for diagnosing problems with the model. Divergences almost always signal a problem and even a small number of divergences cannot be safely ignored.
What is a divergent transition?
For some intuition, imagine walking down a steep mountain. If you take too big of a step you will fall, but if you can take very tiny steps you might be able to make your way down the mountain, albeit very slowly. The mountain here is our posterior distribution. A divergent transition signals that Stan was unable to find a step size that would be big enough to actually explore the posterior while still being small enough to not fall. The problem is usually with somehow â€śunevenâ€ť or â€śdegenerateâ€ť geometry of the posterior.
Further reading
 Identity Crisis  a rigorous treatment on the causes of divergences, diagnosis and treatment.
 Taming divergences in Stan models less rigorous, but hopefully more accessible intuition on what divergent transitions are.
 Divergent transitions in Stan reference manual
 A Conceptual Introduction to Hamiltonian Monte Carlo
Strategies to diagnose and resolve divergences

Check your code. Divergences are almost as likely a result of a programming error as they are a truly statistical issue. Do all parameters have a prior? Do your array indices and for loops match?

Create a simulated dataset with known true values of all parameters. It is useful for so many things (including checking for coding errors). If the errors disappear on simulated data, your model may be a bad fit for the actual observed data.

Reduce your model. Find the smallest / least complex model and a (preferrably simulated) dataset that shows problems. Only add more complexity after you resolve all the issues with the small model. If your model has multiple components (e.g. say a linear predictor for parameters in an ODE model), build and test small models where each of the components is separate (e.g. a separate linear model and separate ODE model with constant parameters).

Visualisations: use
mcmc_parcoord
from thebayesplot
package, Shinystan andpairs
fromrstan
. Further reading: 
Make sure your model is identifiable  nonidentifiability (i.e. parameters are not well informed by data, large changes in parameters can result in almost the same posterior density) and/or multimodality (i.e. multiple local maxima of the posterior distributions) cause problems. Further reading:
 Case study  mixture models
 Identifying nonidentifiability  some informal intuition of the concept and examples of problematic models and how to spot them.
 Underdetermined linear regression discusses problems arising when the data cannot inform all parameters.
 Interpretation of cor term from multivariate animal models has an example where a varying intercept at individuallevel is not identified.

Check your priors. If the model is sampling heavily in the very tails of your priors or on the boundaries of parameter constraints, this is a bad sign.

Avoid overly wide prior distributions, unless really large values of the parameters are plausible. Especially when working on the logarithmic scale (e.g. logistic/Poisson regression) even seemingly narrow priors like
normal(0, 1);
can be actually quite wide (this makes an odds ratio/multiplicative effect ofexp(2)
or roughly7.4
still apriori plausible). 
If you have additional knowledge that would let you defensibly constrain your priors use it. Identity Crisis has some discussion of when this can help. However, be careful to not use tighter priors than you can actually justify from background knowledge.

Reparametrize your model to make your parameters independent (uncorrelated), constrained by the data and close to N(0,1) (a.k.a change the actual parameters and compute your parameters of interest in the
transformed parameters
block).Further reading: Case study  diagnosing a multilevel model discusses noncentered parametrization which is frequently useful.
 Betancourt & Girolami 2015  more formal treatment of noncentered parametrization
 Identifying nonidentifiability  a sigmoid model shows an example of where the parameters are not well informed by data, while Difficulties with logistic population growth model show a potential reparametrization.
 Reparametrizing the Sigmoid Model of Gene Regulation shows problems and solutions in an ODE model.

Move parameters to the
data
block and set them to their true values (from simulated data). Then return them one by one toparameters
block. Which parameter introduces the problems? 
Introduce tight priors centered at true parameter values. How tight need the priors to be to let the model fit? Useful for identifying multimodality.

Run Stan with the
test_grad
option  can detect some numerical instabilities in your model. 
Play a bit more with
adapt_delta
,stepsize
andmax_treedepth
; see here for an example. Note that increasingadapt_delta
in particular has become quite common as the goto first thing people try, and while there are cases where it becomes necessary to increaseadapt_delta
for an otherwise wellbehaving model, increases absent the more rigorous exploration options above can hide pathologies that may impair accurate sampling. Furthermore, increasingadapt_delta
will certainly slow down sampling performance. You are more likely to achieve both better sampling performance and a more robust model (not to mention understanding thereof) by pursing the above options and leaving adjustment ofadapt_delta
as a lastresort. Increasingadapt_delta
beyond 0.99 andmax_treedepth
beyond 12 is seldom useful. Also note that for the purpose of diagnosis, it is actually better to have more divergences, so reverting to default settings for diagnosis is recommended.
If you fail to diagnose/resolve the problem yourself or if you have trouble understanding or executing some of the strategies outlined above, you are welcome to ask here on Discourse, weâ€™ll try to help!