I’ve been working with @bbbales2 and @avehtari to materialize the campfire algorithm using MPI. The implementation is experimental, and we’d like to hear from you how the algorithm perform for your models.
Setup
The implementation touches all three Stan components(cmdstan
, stan
, and stan_math
). One can retrieve them all using
git clone recursive branch mpi_warmup_framework https://github.com/standev/cmdstan.git
MPI is required to compile and run. @bbbales2 points out that installing mpi in Ubuntu was:
sudo apt install libopenmpidev
And the include path was:
/usr/include/mpi
Compile
Add
MPI_ADAPTED_WARMUP = 1
CXXFLAGS += isystem /path/to/mpi/headers
in your make/local
. Make sure STAN_MPI
is not used here. Note that you may also need
TBB_CXX_TYPE= # cpp compiler
as required by TBB, though it’s irrelevant to this discussion.
In cmdstan
, compile the included radon
model by
make examples/radon/radon
Run
Depends the MPI library(MPICH
or OpenMPI
, we haven’t tested it on others)
mpiexec n 4 l ./radon sample data file=radon.data.R #MPICH
mpiexec n 4 tagoutput ./radon sample data file=radon.data.R #OpenMPI
runs the model with 4 processes. By default this would generate 4 chains that communicate with each other during warmup. As shown in the original proposal, the chains would exchange information in order to calculate rhat
and ESS
, and they are used to determine the efficacy of warmup so that sampling can be started.
Output
stdout
would be like
[2] Iteration: 1 / 2000 [ 0%] (Warmup)
[1] Iteration: 1 / 2000 [ 0%] (Warmup)
[3] Iteration: 1 / 2000 [ 0%] (Warmup)
[0] Iteration: 1 / 2000 [ 0%] (Warmup)
[1] Iteration: 100 / 2000 [ 5%] (Warmup)
[0] Iteration: 100 / 2000 [ 5%] (Warmup)
[3] Iteration: 100 / 2000 [ 5%] (Warmup)
[2] Iteration: 100 / 2000 [ 5%] (Warmup)
[0] iteration: 100 window: 1 / 1 Rhat: 1.02 ESS: 12.39
[0] Iteration: 200 / 2000 [ 10%] (Warmup)
[2] Iteration: 200 / 2000 [ 10%] (Warmup)
[1] Iteration: 200 / 2000 [ 10%] (Warmup)
[3] Iteration: 200 / 2000 [ 10%] (Warmup)
[0] iteration: 200 window: 1 / 2 Rhat: 1.01 ESS: 24.14
[0] iteration: 200 window: 2 / 2 Rhat: 1.02 ESS: 39.90
[3] Iteration: 300 / 2000 [ 15%] (Warmup)
[1] Iteration: 300 / 2000 [ 15%] (Warmup)
Each line is prefixed with MPI proc ID. Every cross_chain_window
iterations(see below) convergence test output shows the iteration, window, Rhat
, ESS
. Test is passed when Rhat<cross_chain_rhat
and ESS>cross_chain_ess
(see below).
The above run generates 4 output files. With output file=aaa.csv
, the MPI output would be mpi.0.aaa.csv
…mpi.3.aaa.csv
, similar to 4 separated regular sampling outputs.
cmdstan
arguments
./radon helpall  grep "cross_chain.*=" A 4
should show four additional arguments to cmdstan
under adapt
category
num_cross_chains=<unsigned int>
Number of chains in crosschain warmup iterations
Valid values: num_cross_chains < number of MPI processes
Defaults to number of MPI processes

cross_chain_window=<unsigned int>
Window size for crosschain warmup
Valid values: All
Defaults to 100

cross_chain_rhat=<double>
Target Rhat for crosschain warmup
Valid values: 0.8 < cross_chain_rhat
Defaults to 1.05

cross_chain_ess=<unsigned int>
Target ESS for crosschain warmup
Valid values: All
Defaults to 50
In particular
 Every
num_cross_chains
iters the convergence test is performed.  The number of chains defaults to the nb. of MPI procs. One can specifies it manually by using
num_cross_chains
but it must be less than or equal to nb. of MPI procs.  When warmup is deemed sufficient, each chain’s
stepsize
would be overwritten with chain average stepsize, and the mass matrix would be recalculated using draws from all the chains.
Caveats

rhat
andESS
in the tests are based onlp__
only.  Currently
ESS
is not calculated as in Stan but only based on single chain draws. Crosschain version is on the way.  Currently only implemented for adapt diag metrcs. Dense metrics on the way.
We’d like to hear your comments and feedback!