Cmdstanpy, mpi speedup

e32432423 · November 1, 2024, 11:26am

Hi all,

I have connected a stan program to MPI in hopes for a speedup. I have 10 nodes running (verified by seeing their CPUs max out), but I am not seeing a 10x speedup. I do have a small network latency, so not sure if that’s throttling me a bit. Here is the output of the running program:

Starting mpi process
method = sample (Default)
sample
num_samples = 1000 (Default)
num_warmup = 500
save_warmup = false (Default)
thin = 1 (Default)
adapt
engaged = true (Default)
gamma = 0.05 (Default)
delta = 0.8 (Default)
kappa = 0.75 (Default)
t0 = 10 (Default)
init_buffer = 75 (Default)
term_buffer = 50 (Default)
window = 25 (Default)
save_metric = false (Default)
algorithm = hmc (Default)
hmc
engine = nuts (Default)
nuts
max_depth = 10 (Default)
metric = diag_e (Default)
metric_file = (Default)
stepsize = 1 (Default)
stepsize_jitter = 0 (Default)
num_chains = 1 (Default)
id = 1 (Default)
data
file = stan_data.json
init = 2 (Default)
random
seed = 3950355111 (Default)
output
file = blah.csv
diagnostic_file = (Default)
refresh = 100 (Default)
sig_figs = -1 (Default)
profile_file = profile.csv (Default)
save_cmdstan_config = false (Default)
num_threads = 1 (Default)
mpi_enabled = 1

Iteration: 1 / 1500 [ 0%] (Warmup)

is there a specific config change I need to explore to benefit from a MPI speedup further? Thank you,

e32432423 · November 1, 2024, 2:49pm

Did some reading… it looks like MPI is designed for parallelizing across multiple chains, not for speeding up a single chain. Is this correct?

If so, is there a way to speed up a single chain?..

jsocolar · November 1, 2024, 5:10pm

Yes! See Parallelization

e32432423 · November 1, 2024, 5:22pm

thank you. the piece I may have missed is this:

In order to use multi-processing with MPI in a Stan model, the models must be rewritten to use the map_rect function. By using MPI, the model can be parallelized across multiple cores or a cluster.

I will do this next

e32432423 · November 4, 2024, 7:00pm

hello, I re-wrote my function to use map_rect() function. MPI configured.

When it comes to this step:
fit = model.sample(data=stan_data, **rc.sample_kwargs)

I now replace it with:
fit = cmdstanpy.from_csv(‘output.csv’) (the output.csv is the output from the MPI process, seems to be populated with all the outputs)

I get this error:

Invalid or corrupt Stan CSV output file.

How do I get a MPI output file to use for fitting? thanks,

e32432423 · November 4, 2024, 8:42pm

it appears that the from_csv() is a bit brittle because the

subprocesses() generation of csv_output.csv creates a csv file but it may not be exactly how cmdstanpy would expect (for example, there are true/false values instead of 1/0 values that the from_csv() does not like)

so I stumbled on this:

github.com/stan-dev/cmdstanpy

CmdStan does not work with open MPI

opened 09:37AM - 17 Mar 21 UTC

closed 06:54PM - 16 May 22 UTC

jmaronas

feature method compilation

#### Summary: CmdStan does not work with open MPI c++ compiler feature ###…# Description: When compiling a stan program to compile and use open MPI, as: ``` cpp_options = { 'STAN_MPI' : True , } sm = CmdStanModel(stan_file='./stan_files/BNN.stan', cpp_options = cpp_options ) ``` The model fails to successfully compile the program (after installing open mpi) #### Additional Information: The solution was to add the following lines to the cmdstan/make/local file: ``` TBB_CXX_TYPE=gcc CXX=mpicxx ``` So in order to allow cmdstanpy to compile using mpi, the class `CompilerOptions` in file `compiler_opts` should add those flags ##### IMPORTANT NOTE With this modification, the mpi compilation is invoqued. However, the program fails to compile but I think that is on the cmdstan side, and not the cmdstanpy. Will figure that out and come back.

here I found I can do

cpp_options = {
                'STAN_MPI'           : True ,      
                'CXX': 'mpicxx',
                'TBB_CXX_TYPE': 'gcc'
              }

sm = CmdStanModel(stan_file='./stan_files/BNN.stan', cpp_options = cpp_options, compile='force')

but I don’t have control of # of nodes for MPI to run,

e32432423 · November 4, 2024, 9:03pm

using cpp_options ^ as seen above, doesn’t run the other nodes in my MPI network.

the subprocess.run() worked as intended, but I can’t seem to get cmdstanpy to work with its output to create the fit. anyone able to get MPI working with this?

Bob_Carpenter · November 4, 2024, 9:33pm

Hi, @e32432423 (?—shouldn’t the last two digits be “32”?).

What platform are you running on? I believe this is much more challenging with Windows.

You can use multiple threads within a single machine, or MPI across machines. Usually it’s better to scale up on one machine until you run out of room and only then scale out using MPI, which can be much slower due to network latency. MPI is only going to pay off across machines if you have more compute to do than network latency, which will depend on the Stan program and the hardware setup.

It’s generally helpful to us debug if you provide a reproducible example. It looks like you’re running at least something through python?

e32432423 · November 4, 2024, 10:07pm

I chose “23” at the end to throw you off Bob ;).

this is linux.

i found this:

for cmdstanpy equivalent… it looks like I need to stick with cmdstanpy so I found the following:

cpp_options = {
    'STAN_MPI': True,
    'CXX': 'mpicxx',
    'TBB_CXX_TYPE': 'gcc'
}
model = cmdstanpy.CmdStanModel(stan_file=stan_file_path, cpp_options=cpp_options, force_compile=True)

in cmdstanr world, the next step would be:
fit = model.sample_mpi(data=stan_data, mpi_cmd = “mpiexec”,mpi_args = “-n 4”)

for running
mpiexec -n 4 model_executable

but in python, any attempt at that i get
AttributeError: ‘CmdStanModel’ object has no attribute ‘sample_mpi’

as there doesn’t seem to be any matching documentation for cmdstanpy equivalents
https://mc-stan.org/cmdstanpy/

e32432423 · November 4, 2024, 10:28pm

Instead of running sample() from Python, i tried to run the command manually:

subprocess.run([
‘mpirun’, ‘-np’, ‘4’, ‘-f’, ‘node_config_file’, model.exe_file,
‘sample’, ‘num_samples=1000’, ‘num_warmup=500’,
‘data’, ‘file={}’.format(stan_data_file),
‘output’, ‘file=output.csv’
])

and then
fit = cmdstanpy.from_csv(‘output.csv’) but the from_csv() is brittle and the outputted csv from subprocess.run() doesn’t match exactly to what from_csv() is looking for (e.g., true/false instead of 1/0, and other things)

e32432423 · November 4, 2024, 10:52pm

I think this is a general cmdstanpy + mpi issue. I cannot change the title. There are 2 possible solutions:

ensure the community can run the command:
mpiexec -n 4 -f node_config_file model.exe_file

within the cmdstanpy syntaxes, where node_config_file tells how many threads per node to run, or

the cmdstanpy.from_csv() needs to read the csv outputted from the subprocess.run() command in above post.

I would imagine addressing either of these will solve this general problem for anyone trying to use mpi+cmdstanpy. There is no obvious cmdstanpy equivalent of .sample_mpi() (with mpi_args argument) from cmdstanr world.

WardBrian · November 5, 2024, 2:28pm

Are you using the latest cmdstanpy?

Have you been able to run any non-MPI sampling successfully?

e32432423 · November 5, 2024, 2:54pm

my cmdstanpy version is
1.2.2
I notice there is 1.2.4. I will try it. But

what is the syntax such that I can run a multi-node mpi command such as this in cmdstanpy, like cmdstanr allows?
mpiexec -n 4 -f node_config_file model.exe_file
i tried:
cpp_options = {
‘STAN_MPI’: True,
‘CXX’: ‘mpicxx’,
‘TBB_CXX_TYPE’: ‘gcc’,
‘mpiexec’: ‘-n 4 -f node_config_file’
}
model = cmdstanpy.CmdStanModel(stan_file=stan_file_path, cpp_options=cpp_options, force_compile=True)
fit = model.sample(data=stan_data)
and it doesn’t execute the MPI as expected.
If not possible, then will 1.2.4 resolve the cmdstanpy.from_csv() being brittle? The subprocess.run() command of the executable (see above post) runs the MPI as expected, but the outputted ‘output.csv’ is not readable by cmdstanpy.from_csv().

yes, non-MPI sampling works fine.

WardBrian · November 5, 2024, 3:01pm

Cmdstanpy does not have this feature built in in the way that it appears cmdstanr does

The primary reason for the 1.2.4 release was to fix issues with reading the CSV files from recent cmdstans

e32432423 · November 5, 2024, 3:03pm

May I ask why?
I am hoping 1.2.4 has from_csv() fixed- trying now,

WardBrian · November 5, 2024, 3:16pm

I don’t think there is any reason other than nobody wrote the code. sample_mpi being added to cmdstanr predates my involvement with the project

e32432423 · November 5, 2024, 8:42pm

cmdstanpy 1.2.4 seemed to read in the csv file using cmdstanpy.from_csv() method. this is excellent, and considered closed at this time.

e32432423 · November 12, 2024, 5:36am

here’s a long shot question.

the above has discussed running the .exe file using mpi command, parallelizing for num_chains=1, via n_shard and map_rect. This helps with the gradient evaluation time.

is there a way to parallelize to decrease iteration completion time? Does an iteration include many serial gradient evaluations?
is there also a way to parallelize across num_chains, where num_chains>1? Currently it runs serially, chain1 iterations…then chain2 iterations… I would imagine that is a parallelizable opportunity…

WardBrian · November 12, 2024, 2:26pm

If you’re already using map_rect, there are not really any additional opportunities for parallelism within one iteration. There are some (potentially wasteful) options discussed here, but not implemented

Yes, definitely. Parallelization. You can also do what cmdstanpy and friends used to do and wrap your subprocess call in something from python’s multiprocessing module

e32432423 · November 14, 2024, 4:27pm

is it true that the pymc devs found a way to parallelize via cpu/gpu? If so, how can Stan do same?

it seems like the sampling can be parallelized?..

Topic		Replies	Views
Correct way to use MPI with cmdstanpy Modeling	9	709	August 29, 2020
Cmdstan 2.18 MPI Modeling	36	3085	September 12, 2018
Running cmdstanr in parallel on computing cluster General	6	994	December 9, 2022
Running MPI General performance	9	1949	August 24, 2018
MPI Stan + cmdstan General	8	1182	June 15, 2018

Cmdstanpy, mpi speedup

Related topics