Cross-chain warmup adaptation using MPI

Not really. For this model post-warmup sampling takes approximately same amount of time for regular & cross-chain runs(regular vs nproc=4 in the above plot). With additional cores the benefit of within-chain parallelization kicks in and run time gets further reduced for both warmup & sampling(nproc=8 & nproc=16 & nproc=32 in the above plot).