Just so that it doesn’t get lost for anyone updating the webpage, here is a list of recommendations I came up with for my blog on divergences - maybe this would be a good starting point?:
- Check your code. Twice. Divergences are almost as likely a result of a programming error as they are a truly statistical issue. Do all parameters have a prior? Do your array indices and for loops match?
- Create a simulated dataset with known true values of all parameters. It is useful for so many things (including checking for coding errors). If the errors disappear on simulated data, your model may be a bad fit for the actual observed data.
- Check your priors. If the model is sampling heavily in the very tails of your priors or on the boundaries of parameter constraints, this is a bad sign.
- Visualisations: use
mcmc_parcoord
from thebayesplot
package, Shinystan andpairs
fromrstan
. Documentation for Stan Warnings (contains a few hints), Case study - diagnosing a multilevel model, Gabry et al. 2017 - Visualization in Bayesian workflow- Make sure your model is identifiable - non-identifiability and/or multimodality (multiple local maxima of the posterior distributions) is a problem. Case study - mixture models, my post on non-identifiable models and how to spot them.
- Run Stan with the
test_grad
option.- Reparametrize your model to make your parameters independent (uncorrelated) and close to N(0,1) (a.k.a change the actual parameters and compute your parameters of interest in the
transformed parameters
block).- Try non-centered parametrization - this is a special case of reparametrization that is so frequently useful that it deserves its own bullet. Case study - diagnosing a multilevel model, Betancourt & Girolami 2015
- Move parameters to the
data
block and set them to their true values (from simulated data). Then return them one by one toparemters
block. Which parameter introduces the problems?- Introduce tight priors centered at true parameter values. How tight need the priors to be to let the model fit? Useful for identifying multimodality.
- Play a bit more with
adapt_delta
,stepsize
andmax_treedepth
. Example
I would also certainly point to Mike’s case study on identifiability and divergences… Identity Crisis and the vignette on visual diagnostics: Visual MCMC diagnostics using the bayesplot package • bayesplot
For easier maintainability, we might also want to replace the link with a link to Discourse.Discourse supports fixed routes, so we could have something like Divergent transitions - a primer - General - The Stan Forums link to a summary topic (in wiki mode) and be able to redirect it to a new topic, should it be in need of a serious update.