Is there a one-page checklist of the captioned written somewhere?
In my recent exploration as a newbie (see this post), I find the following tricks most useful:
specify any known bounds of a parameter (thereby help Stan to pick the appropriate default prior, eg, uniform vs. normal)
explicitly specify a prior for a parameter that better reflects your knowledge about it than Stan’s default prior choice could possibly represent in general
use non-centered parametrization EDIT: (eg, to fix problematic correlated parameters following this example)
This fix might not work out of the box. In my case, need to set very tight priors for the additional _sd, _L, _err parameters resulting from the non-centred parametrization: eg,
if you have a model that runs fine in terms of no divergences but the run time is slow (perhaps due to unnecessarily long warmup and iter), my findings about choosing the combination of adapt_delta / max_treedepth / metric=dense_e vs. diag_e / term_buffer / window / warmup / iter in another post might be useful to you.
Can you add to this list based on your experience? And perhaps even order your tips according to their usefulness/importance/priority.
Build the simplest model possible first. So if you have a time series start with a linear model as y ~ x then build that out to y ~ x + time + (1|time) then on to your choice of time series models.
Make some fake data and run that against your model. You should get back your parameters.
Keep all your models where you were playing around with priors and your justification for those priors. I stick these into supplemental materials.
Where ever possible cite specific threads in this user group in your R and python workflow and in your reports/articles.
Build a non-Bayes model first and get it to run. It’s useful as crosscheck later against Stan.
Check your matrices have full rank.
Try to get to run your Stan with low iterations 50 and one chain.
Use multiple chains to detect multi-modalities
Always look at neff for low values, these indicate problematic parameters.
Do a traceplot of problematic parameters, find suspicious interactions with pairs plot.
Buy a fast computer with lots of CPU cache.
Use exponential / half-normal distribution instead of half-cauchy in case of problems with standarddev.
Suppress the output of large arrays/matrices you don’t need in RStan or use Cmdstan.
Check the problematic scaling of the mass-matrix given from Cmdstan. Adjust parameterization.
Run optimizing instead of NUTS to check if the values look reasonable, if not find out why.
Step back and think about what you model should do, and don’t limit yourself to a key algorithm. There are many ways. Learn about your data and apply different models, the output will help to understand your data in many ways.
Use a routine, always save.image (complete dump) your session, before running a model.
Do you have a pointer to any concise (or thorough) discussion on the diagnostic steps using traceplot and pairs plot that can be added here as a reference for newbies?
For example, those parameters below with neff between 2000 to 5000 (ie, sd_gamma, sd_omega, g[3], w[1], w[3]) are particularly problematic had the sampling not been taken with 8 chains and 3000 iterations or had the data points not been so high as 200x10x4 = 8000.
But now with all these, are the relatively low neff (and the model based on those parameters) considered acceptable?
(relatively low when compared to those with neff > 10,000)
=====================================================
Inference for Stan model:
8 chains, each with iter=3000; warmup=1000; thin=1;
post-warmup draws per chain=2000, total post-warmup draws=16000.
You have to see the traceplots and pairs plots. I mean less than 10% of the sample size and
Rhat away from 1.0 more than 1%. Then I would think, maybe I can and should improve this.
Below are the sample plots of the example given. There seems to be nothing critical in this case. Have I overlooked anything that is alarming? Would appreciate very much your comment. Thanks in advance.
See the paper [1903.08008] Rank-normalization, folding, and localization: An improved $\widehat{R}$ for assessing convergence of MCMC. We recommend minimum neff of about 100 per chain, so that the needed quantities for convergence diagnostic are estimated reliably. Given that, then instead of focusing to neff, you should figure out what is the needed accuracy for the quantity of interest (for reporting usually 2 significant digits is enough) and then you can check that mcse is sufficient (mcse computation uses neff, but it’s enough to focus on mcse for the quantity of interest).