Cross-chain warmup adaptation using MPI

yizhang · June 17, 2020, 6:33am

With cross-chain on top of Torsten’s parallel functions, I’m able to do 2-level parallelism: cross-chains communicating during warmup, and within-chain parallel solution. Here I’m showing the Chemical reactions model performance(all run with 4 chains) solved by

regular stan run(4 independent chains),
4-core cross-chain run(each chain solved by 1 core),
8-core cross-chain run(each chain solved by 2 cores),
16-core cross-chain run(each chain solved by 4 cores), and
32-core cross-chain run(each chain solved by 8 cores).

Since the model involves a population of size 8, the within-chain parallelization evenly distributes the 8 subjects to 1, 2, 4, 8 cores. This setup improves speed in two levels:

cross-chain warmup automatically terminates at num_warmup=350. Below is ESS performance summary.

MPI	nproc=4	regular.
warmup.leapfrogs	1.222100e+04	2.959900e+04
leapfrogs	1.362400e+04	1.407600e+04
mean.warmup.leapfrogs	3.491714e+01	2.959900e+01
mean.leapfrogs	2.724800e+01	2.815200e+01
min(bulk_ess/iter)	1.708000e+00	1.452000e+00
min(tail_ess/iter)	2.184000e+00	2.276000e+00
min(bulk_ess/leapfrog)	6.268350e-02	5.157715e-02
min(tail_ess/leapfrog)	8.015267e-02	8.084683e-02

within-chain parallel solution speeds up. Below is raw wall time(s) comparison.

wall_time_perf2099×2099 144 KB

@avehtari @Bob_Carpenter @billg @bbbales2

Topic		Replies	Views
New adaptive warmup proposal (looking for feedback)! Algorithms	50	4238	July 31, 2020
Cmdstanpy, mpi speedup Developers	26	252	November 19, 2024
Any way to speed up warmup? General performance	5	1968	July 18, 2020
Minimizing warmup iterations - Error reading step size from CmdStan output Interfaces bug , cmdstanpy	4	498	August 15, 2023
Pooled warmup Developers stan	7	893	December 3, 2019

Cross-chain warmup adaptation using MPI

Related topics