Multi-chain vs single-chain

amas0 · January 18, 2023, 3:14pm

The first link you provide is mostly about multithreading within chains, where you are using multithreaded processing within a single chain. This is a way to speed up the evaluation of individual chains, but is separate from the process of running multiple chains. In particular, the graph in the post is about the number of threads per chain, showing a plateau when the author runs 4 threads per chain for 4 chains; which is rightfully explains as plateauing since his computer has 16 cores.

There are a number of benefits to running multiple chains, but I’ll just share a primary one:

Multiple chains enable diagnostics such as R-hat to allow us to probe the validity of the sampler. The goal of any MCMC method is to generate samples from the target distribution \pi(\theta); however, we can’t ever really know in general whether our Markov chain has reached stationarity, i.e. the samples are from \pi(\theta). We have some theorems in our back pocket that tell us the samples will be asymptotically valid, but we are never in that asymptotic regime so we have to use heuristics and diagnostics to justify whether our samples are from \pi(\theta).

One method to do this is to run multiple chains and observe whether or not they have mixed. We initialize multiple chains from different points and if after some time it looks like all of the chains are generating samples from the same distribution, we can take this as a signal that all the chains have reached the same stationary distribution. R-hat measures this mixing behavior, where \hat{R} \approx 1 provides evidence in favor of mixing. Granted, in the presence of multimodal distributions even this is not necessarily a guarantee, but it’s a good start.

So by using multiple chains, Stan gives us additional information that let’s us better make the decision to trust the samples or not.

As to why the default is 4 chains, I imagine there’s some history there I’m unaware of. At the very least 4 core CPUs are fairly standard in most machines, so it seems like a reasonable baseline.

Topic		Replies	Views
Why are 4 chains used? General	2	829	February 12, 2019
Multicore Speedups are different between models Algorithms	25	4833	September 11, 2017
Correlation of markov chains General	12	1281	April 12, 2024
Multiple chains versus single chain, after model converged Modeling specification	8	1081	March 14, 2019
Number of cores and number of chains Developers	4	212	January 16, 2025

Multi-chain vs single-chain

Related topics