if you have a model that runs fine in terms of no divergences but the run time is slow (perhaps due to unnecessarily long warmup and iter), my findings about choosing the combination of adapt_delta / max_treedepth / metric=dense_e vs. diag_e / term_buffer / window / warmup / iter in another post might be useful to you.
Can you add to this list based on your experience? And perhaps even order your tips according to their usefulness/importance/priority.
Build a non-Bayes model first and get it to run. It’s useful as crosscheck later against Stan.
Check your matrices have full rank.
Try to get to run your Stan with low iterations 50 and one chain.
Use multiple chains to detect multi-modalities
Always look at neff for low values, these indicate problematic parameters.
Do a traceplot of problematic parameters, find suspicious interactions with pairs plot.
Buy a fast computer with lots of CPU cache.
Use exponential / half-normal distribution instead of half-cauchy in case of problems with standarddev.
Suppress the output of large arrays/matrices you don’t need in RStan or use Cmdstan.
Check the problematic scaling of the mass-matrix given from Cmdstan. Adjust parameterization.
Run optimizing instead of NUTS to check if the values look reasonable, if not find out why.
Step back and think about what you model should do, and don’t limit yourself to a key algorithm. There are many ways. Learn about your data and apply different models, the output will help to understand your data in many ways.
Use a routine, always save.image (complete dump) your session, before running a model.
For example, those parameters below with neff between 2000 to 5000 (ie, sd_gamma, sd_omega, g, w, w) are particularly problematic had the sampling not been taken with 8 chains and 3000 iterations or had the data points not been so high as 200x10x4 = 8000.
But now with all these, are the relatively low neff (and the model based on those parameters) considered acceptable?
(relatively low when compared to those with neff > 10,000)
Inference for Stan model:
8 chains, each with iter=3000; warmup=1000; thin=1;
post-warmup draws per chain=2000, total post-warmup draws=16000.
Below are the sample plots of the example given. There seems to be nothing critical in this case. Have I overlooked anything that is alarming? Would appreciate very much your comment. Thanks in advance.
Instead of traceplot we recommend using rank histogram plots https://arxiv.org/abs/1903.08008. The online appendix has some examples comparing traceplots and rank histogram plots. Traceplots are not good for long tailed distributions and get fuzzy with long chains.
See the paper https://arxiv.org/abs/1903.08008. We recommend minimum neff of about 100 per chain, so that the needed quantities for convergence diagnostic are estimated reliably. Given that, then instead of focusing to neff, you should figure out what is the needed accuracy for the quantity of interest (for reporting usually 2 significant digits is enough) and then you can check that mcse is sufficient (mcse computation uses neff, but it’s enough to focus on mcse for the quantity of interest).