Why are 4 chains used as the default with Stan (and many Monte Carlo-based methods)? Specifically, I understand why multiple chains are a good idea (robustness to initial conditions). I can work my way through the fact that 3 chains are better than two at least at a minimal level (you get a median!). But, why 4? Why not 5 or 6? Why not 3?
The only reason I can immediately come up with (with minimal research) is that 3 chains are a good idea, and if you’re running 3 chains on multiple cores, 4 probably doesn’t take more time because you probably have an even number of cores for standard computer science reasons of powers of 2.