Cross-chain warmup adaptation using MPI

With cross-chain on top of Torsten’s parallel functions, I’m able to do 2-level parallelism: cross-chains communicating during warmup, and within-chain parallel solution. Here I’m showing the Chemical reactions model performance(all run with 4 chains) solved by

  • regular stan run(4 independent chains),
  • 4-core cross-chain run(each chain solved by 1 core),
  • 8-core cross-chain run(each chain solved by 2 cores),
  • 16-core cross-chain run(each chain solved by 4 cores), and
  • 32-core cross-chain run(each chain solved by 8 cores).

Since the model involves a population of size 8, the within-chain parallelization evenly distributes the 8 subjects to 1, 2, 4, 8 cores. This setup improves speed in two levels:

  • cross-chain warmup automatically terminates at num_warmup=350. Below is ESS performance summary.
MPI nproc=4 regular.
warmup.leapfrogs 1.222100e+04 2.959900e+04
leapfrogs 1.362400e+04 1.407600e+04
mean.warmup.leapfrogs 3.491714e+01 2.959900e+01
mean.leapfrogs 2.724800e+01 2.815200e+01
min(bulk_ess/iter) 1.708000e+00 1.452000e+00
min(tail_ess/iter) 2.184000e+00 2.276000e+00
min(bulk_ess/leapfrog) 6.268350e-02 5.157715e-02
min(tail_ess/leapfrog) 8.015267e-02 8.084683e-02

@avehtari @Bob_Carpenter @billg @bbbales2

6 Likes