Multiple chains and posterior exploration

Excellent way of putting it!

The important question is then “how long does it take to produce viable diagnostics”? Diagnostics like Rhat are iteration hungry – it takes a good number of effective samples to be able to resolve differences in the chains, which is why Rhat often misses pathologies. Diagnostics like divergences are more sensitive, but you still need reasonably long chains to get enough divergences to identify where in the model the pathology is manifesting.

Ultimately you need to run each chain long enough to get reasonable expectation estimates, so by the time the diagnostics are really robust the chain will almost be long enough for your inferential goal! There is some wiggle room, however, and there is potentially opportunity for moderate speed ups with 2-10 chains. Any more than that should be considered only for improving diagnostics (more opportunities to randomly fall onto a pathology early), not speeding up inferences.