I’ve been working with @bbbales2 and @avehtari to materialize the campfire algorithm using MPI. The implementation is experimental, and we’d like to hear from you how the algorithm perform for your models.
Setup
The implementation touches all three Stan components(cmdstan, stan, and stan_math). One can retrieve them all using
git clone --recursive --branch mpi_warmup_framework https://github.com/stan-dev/cmdstan.git
MPI is required to compile and run. @bbbales2 points out that installing mpi in Ubuntu was:
sudo apt install libopenmpi-dev
And the include path was:
/usr/include/mpi
Compile
Add
MPI_ADAPTED_WARMUP = 1
CXXFLAGS += -isystem /path/to/mpi/headers
in your make/local. Make sure STAN_MPI is not used here. Note that you may also need
TBB_CXX_TYPE= # cpp compiler
as required by TBB, though it’s irrelevant to this discussion.
In cmdstan, compile the included radon model by
make examples/radon/radon
Run
Depends the MPI library(MPICH or OpenMPI, we haven’t tested it on others)
mpiexec -n 4 -l ./radon sample data file=radon.data.R #MPICH
mpiexec -n 4 --tag-output ./radon sample data file=radon.data.R #OpenMPI
runs the model with 4 processes. By default this would generate 4 chains that communicate with each other during warmup. As shown in the original proposal, the chains would exchange information in order to calculate rhat and ESS, and they are used to determine the efficacy of warmup so that sampling can be started.
Output
stdout would be like
[2] Iteration: 1 / 2000 [ 0%] (Warmup)
[1] Iteration: 1 / 2000 [ 0%] (Warmup)
[3] Iteration: 1 / 2000 [ 0%] (Warmup)
[0] Iteration: 1 / 2000 [ 0%] (Warmup)
[1] Iteration: 100 / 2000 [ 5%] (Warmup)
[0] Iteration: 100 / 2000 [ 5%] (Warmup)
[3] Iteration: 100 / 2000 [ 5%] (Warmup)
[2] Iteration: 100 / 2000 [ 5%] (Warmup)
[0] iteration: 100 window: 1 / 1 Rhat: 1.02 ESS: 12.39
[0] Iteration: 200 / 2000 [ 10%] (Warmup)
[2] Iteration: 200 / 2000 [ 10%] (Warmup)
[1] Iteration: 200 / 2000 [ 10%] (Warmup)
[3] Iteration: 200 / 2000 [ 10%] (Warmup)
[0] iteration: 200 window: 1 / 2 Rhat: 1.01 ESS: 24.14
[0] iteration: 200 window: 2 / 2 Rhat: 1.02 ESS: 39.90
[3] Iteration: 300 / 2000 [ 15%] (Warmup)
[1] Iteration: 300 / 2000 [ 15%] (Warmup)
Each line is prefixed with MPI proc ID. Every cross_chain_window iterations(see below) convergence test output shows the iteration, window, Rhat, ESS. Test is passed when Rhat<cross_chain_rhat and ESS>cross_chain_ess(see below).
The above run generates 4 output files. With output file=aaa.csv, the MPI output would be mpi.0.aaa.csv…mpi.3.aaa.csv, similar to 4 separated regular sampling outputs.
cmdstan arguments
./radon help-all | grep "cross_chain.*=" -A 4
should show four additional arguments to cmdstan under adapt category
num_cross_chains=<unsigned int>
Number of chains in cross-chain warmup iterations
Valid values: num_cross_chains < number of MPI processes
Defaults to number of MPI processes
--
cross_chain_window=<unsigned int>
Window size for cross-chain warmup
Valid values: All
Defaults to 100
--
cross_chain_rhat=<double>
Target Rhat for cross-chain warmup
Valid values: 0.8 < cross_chain_rhat
Defaults to 1.05
--
cross_chain_ess=<unsigned int>
Target ESS for cross-chain warmup
Valid values: All
Defaults to 50
In particular
- Every
num_cross_chainsiters the convergence test is performed. - The number of chains defaults to the nb. of MPI procs. One can specifies it manually by using
num_cross_chainsbut it must be less than or equal to nb. of MPI procs. - When warmup is deemed sufficient, each chain’s
stepsizewould be overwritten with chain average stepsize, and the mass matrix would be re-calculated using draws from all the chains.
Caveats
-
rhatandESSin the tests are based onlp__only. - Currently
ESSis not calculated as in Stan but only based on single chain draws. Cross-chain version is on the way. - Currently only implemented for adapt diag metrcs. Dense metrics on the way.
We’d like to hear your comments and feedback!