In some order,
- Divergences
- Max treedepth exceeded
- Low Neff
- Bad Rhat (multiple chains makes this work really well imo)
- Slow chains/high treedepths
- Some chains fast and some chains slow
I thought this worked pretty well: [1905.11916] Selecting the Metric in Hamiltonian Monte Carlo – that’s a heuristic of given two metrics guess which will do better. Sorta different from detecting things went awry – more a guess at what might break in the future.
Cool. It’s something we want to improve.
The workflow paper: http://www.stat.columbia.edu/~gelman/research/unpublished/Bayesian_Workflow_article.pdf probably has more of the inspiration – some combination of wanting to run faster + fail faster
There’s a channel on the slack where we’re talking about benchmarking this stuff: Mc-stan community slack (basically uses of GitHub - stan-dev/posteriordb: Database with posteriors of interest for Bayesian inference)