I’ve been working with @bbbales2 and @avehtari to materialize the campfire algorithm using MPI. The implementation is experimental, and we’d like to hear from you how the algorithm perform for your models.
The implementation touches all three Stan components(
stan_math). One can retrieve them all using
git clone --recursive --branch mpi_warmup_framework https://github.com/stan-dev/cmdstan.git
MPI is required to compile and run. @bbbales2 points out that installing mpi in Ubuntu was:
sudo apt install libopenmpi-dev
And the include path was:
MPI_ADAPTED_WARMUP = 1 CXXFLAGS += -isystem /path/to/mpi/headers
make/local. Make sure
STAN_MPI is not used here. Note that you may also need
TBB_CXX_TYPE= # cpp compiler
as required by TBB, though it’s irrelevant to this discussion.
cmdstan, compile the included
radon model by
Depends the MPI library(
OpenMPI, we haven’t tested it on others)
mpiexec -n 4 -l ./radon sample data file=radon.data.R #MPICH mpiexec -n 4 --tag-output ./radon sample data file=radon.data.R #OpenMPI
runs the model with 4 processes. By default this would generate 4 chains that communicate with each other during warmup. As shown in the original proposal, the chains would exchange information in order to calculate
ESS, and they are used to determine the efficacy of warmup so that sampling can be started.
stdout would be like
 Iteration: 1 / 2000 [ 0%] (Warmup)  Iteration: 1 / 2000 [ 0%] (Warmup)  Iteration: 1 / 2000 [ 0%] (Warmup)  Iteration: 1 / 2000 [ 0%] (Warmup)  Iteration: 100 / 2000 [ 5%] (Warmup)  Iteration: 100 / 2000 [ 5%] (Warmup)  Iteration: 100 / 2000 [ 5%] (Warmup)  Iteration: 100 / 2000 [ 5%] (Warmup)  iteration: 100 window: 1 / 1 Rhat: 1.02 ESS: 12.39  Iteration: 200 / 2000 [ 10%] (Warmup)  Iteration: 200 / 2000 [ 10%] (Warmup)  Iteration: 200 / 2000 [ 10%] (Warmup)  Iteration: 200 / 2000 [ 10%] (Warmup)  iteration: 200 window: 1 / 2 Rhat: 1.01 ESS: 24.14  iteration: 200 window: 2 / 2 Rhat: 1.02 ESS: 39.90  Iteration: 300 / 2000 [ 15%] (Warmup)  Iteration: 300 / 2000 [ 15%] (Warmup)
Each line is prefixed with MPI proc ID. Every
cross_chain_window iterations(see below) convergence test output shows the iteration, window,
ESS. Test is passed when
The above run generates 4 output files. With
output file=aaa.csv, the MPI output would be
mpi.3.aaa.csv, similar to 4 separated regular sampling outputs.
./radon help-all | grep "cross_chain.*=" -A 4
should show four additional arguments to
num_cross_chains=<unsigned int> Number of chains in cross-chain warmup iterations Valid values: num_cross_chains < number of MPI processes Defaults to number of MPI processes -- cross_chain_window=<unsigned int> Window size for cross-chain warmup Valid values: All Defaults to 100 -- cross_chain_rhat=<double> Target Rhat for cross-chain warmup Valid values: 0.8 < cross_chain_rhat Defaults to 1.05 -- cross_chain_ess=<unsigned int> Target ESS for cross-chain warmup Valid values: All Defaults to 50
num_cross_chainsiters the convergence test is performed.
- The number of chains defaults to the nb. of MPI procs. One can specifies it manually by using
num_cross_chainsbut it must be less than or equal to nb. of MPI procs.
- When warmup is deemed sufficient, each chain’s
stepsizewould be overwritten with chain average stepsize, and the mass matrix would be re-calculated using draws from all the chains.
ESSin the tests are based on
ESSis not calculated as in Stan but only based on single chain draws. Cross-chain version is on the way.
- Currently only implemented for adapt diag metrcs. Dense metrics on the way.
We’d like to hear your comments and feedback!