Why are 4 chains used?

Why are 4 chains used as the default with Stan (and many Monte Carlo-based methods)? Specifically, I understand why multiple chains are a good idea (robustness to initial conditions). I can work my way through the fact that 3 chains are better than two at least at a minimal level (you get a median!). But, why 4? Why not 5 or 6? Why not 3?

The only reason I can immediately come up with (with minimal research) is that 3 chains are a good idea, and if you’re running 3 chains on multiple cores, 4 probably doesn’t take more time because you probably have an even number of cores for standard computer science reasons of powers of 2.

It is mostly the even number of cores thing and the fact that with Stan the sampling is usually pretty efficient or else you get warnings implying that it is not. So, 1000 warmup followed by 1000 more iterations that you keep on 4 chains can get you an effective sample size of 400 even if you have an effective sample size that is only 10% of the nominal one. And 400 effective draws is good enough for most purposes.

2 Likes

Thank you. It’s good to know that my intuition was reasonable.