Text for warning message

Just so that it doesn’t get lost for anyone updating the webpage, here is a list of recommendations I came up with for my blog on divergences - maybe this would be a good starting point?:

  1. Check your code. Twice. Divergences are almost as likely a result of a programming error as they are a truly statistical issue. Do all parameters have a prior? Do your array indices and for loops match?
  2. Create a simulated dataset with known true values of all parameters. It is useful for so many things (including checking for coding errors). If the errors disappear on simulated data, your model may be a bad fit for the actual observed data.
  3. Check your priors. If the model is sampling heavily in the very tails of your priors or on the boundaries of parameter constraints, this is a bad sign.
  4. Visualisations: use mcmc_parcoord from the bayesplot package, Shinystan and pairs from rstan . Documentation for Stan Warnings (contains a few hints), Case study - diagnosing a multilevel model, Gabry et al. 2017 - Visualization in Bayesian workflow
  5. Make sure your model is identifiable - non-identifiability and/or multimodality (multiple local maxima of the posterior distributions) is a problem. Case study - mixture models, my post on non-identifiable models and how to spot them.
  6. Run Stan with the test_grad option.
  7. Reparametrize your model to make your parameters independent (uncorrelated) and close to N(0,1) (a.k.a change the actual parameters and compute your parameters of interest in the transformed parameters block).
  8. Try non-centered parametrization - this is a special case of reparametrization that is so frequently useful that it deserves its own bullet. Case study - diagnosing a multilevel model, Betancourt & Girolami 2015
  9. Move parameters to the data block and set them to their true values (from simulated data). Then return them one by one to paremters block. Which parameter introduces the problems?
  10. Introduce tight priors centered at true parameter values. How tight need the priors to be to let the model fit? Useful for identifying multimodality.
  11. Play a bit more with adapt_delta , stepsize and max_treedepth . Example

I would also certainly point to Mike’s case study on identifiability and divergences… Identity Crisis and the vignette on visual diagnostics: Visual MCMC diagnostics using the bayesplot package • bayesplot

For easier maintainability, we might also want to replace the link with a link to Discourse.Discourse supports fixed routes, so we could have something like Divergent transitions - a primer - General - The Stan Forums link to a summary topic (in wiki mode) and be able to redirect it to a new topic, should it be in need of a serious update.

6 Likes