I am wondering for cluster/parallel computing if it is generally a good strategy to set the number of chains equal to the number of cores available, for the purposes of speeding up the code. If I wanted 4000 non-warmup iterations and I had 40 cores available, would it be a wise strategy to have
200 iterations, 100 burn-in to get 100 non-warmup chains on each?
What are some issues I should be aware for such a set-up? Thanks!
The adaptation phase should be longer than that, as otherwise you may get warnings about suboptimal adaptation (I don’t remember the exact message). In my models I generally receive warnings if I use fewer than 300 iterations, but it may be different for you. To be on the safe side I would suggest using at least 500. Once you’ve spent time on the adaptation phase, it probably makes sense to use that for sampling a longer chain, since at that point it should be faster.
In terms of collecting posterior samples, there is an advantage in having more chains than not, but I don’t think having say 40 short chains is better than 8 much longer ones, unless your model is so terribly complicated or big that it’s unbearably slow. But then if so you may run into memory problems as you asked in the other thread.
I’ll often run for 100 or 200 iterations when I’m building models, because even a small number of iterations can allow Stan to get close to the posterior distribution. You can always run longer, once you’re settled on a model.
Also, we’re working on various ideas for improved adaptation.