I recently ran a model for only 200 iterations per chain, and the only warning that Stan gave me was about exceeding the max tree depth. I was not very concerned about this warning because I understand it to be an efficiency concern instead of a validity one. Moreover, the effective sample size for the parameter of interest was over 100. I had confidence that things were going to work fine after I ran the same model for 2000 iterations per chain tweaking `adapt_delta=0.99`

and `max_treedeph=15`

. Alas, I woke up to 4k DTs. This brings me to my question: **When can I trust results from a short run?**

You need to observe the longer runs first and see them complete successfully consistently. Then it’s ok to cut back on post-warmup iterations till you get to your desired smaller effective sample size. If you shorten warmup I would suggest revalidating

Thanks @sakrejda! In some cases, though, each iteration takes a long time. Are there instances when it’s OK to trust results from a short run (assuming no DTs, big effective sample sizes, good Rhats)? If not, how long is long enough? @shira, any thoughts?

In many models you find the main mad of the distribution within a hundred (?) iterations and get decent sampling efficiency soon after but larger sample sizes are what gives you confidence that the sampler is mixing, not going to get stuck in a corner, that multiple chains are sampling from the same distribution, etc… I know you can make that search wider (more parallel short chains) but it’s limited by the sample size required to compare samples (for rhat/ess/etc… )

I wish I knew! Great question :)

Reading this now! https://betanalpha.github.io/assets/case_studies/divergences_and_bias.html#21_a_dangerously-short_markov_chain