Correct way to use MPI with cmdstanpy

nerpa · August 26, 2020, 5:33pm

Hello all,
I am trying to launch a cmdstanpy sampling using MPI with reduce_sum and I have a few questions about the correct implementation.
I modified the make/local file and build the binaries, but still have a few questions:

Should I compile the model with STAN_THREADS: TRUE and define os.environ["STAN_NUM_THREADS"] = "n" (as if I was multithreading on a single node) or not?
Should I submit only the sampling process as a cluster MPI job or can it be the entire process including model compilation?
For some reason, the progress bar goes away when I run the sampling with MPI. Is there a way to bring it back? (I already use show_progress=True).

Thank you!

mitzimorris · August 26, 2020, 6:04pm

damn - we don’t yet have the correct invocation needed to run MPI - cf discussion in this issue:

github.com/stan-dev/cmdstanpy

Best way to use MPI with cmdstanpy

opened 04:12PM - 11 Aug 20 UTC

closed 04:58PM - 19 Mar 21 UTC

grburgess

duplicate

#### Summary: What is the proper workflow for running a python script when on…e wants to use the MPI threading for reduce_sum #### Description: I have a python script which will pass some data to a stan model. ```python import cmdstanpy model = cmdstanpy.CmdStanModel( stan_file=stan_file, cpp_options={"STAN_MPI: TRUE"} n_chains = 4 n_procs = 12 model.sample(..., parallel_chains = n_chains, threads_per_chain=n_procs) ``` Do I run this with ```bash mpirun -np 48 python my script.py ``` ? So far, this is all crashing locally although all the mpi checks in Stan-math pass. #### Current Version: CmdStan = 2.24 CmdStanPy = 0.9.5

the implementation should be really pretty simple - PR’s welcome!

if you’re running reduce_sum without MPI, there’s some documentation here:
https://cmdstanpy.readthedocs.io/en/latest/sample.html#example-high-level-parallelization-with-reduce-sum

it can be the entire process including model compilation - or you can compile the model and pass in the exe file location - here’s an example script to run a single chain per node on a cluster:

# Use CmdStanPy to run one chain
# Required args:
# - cmdstanpath
# - model_exe
# - seed
# - chain_id
# - output_dir
# - data_file

import os
import sys

from cmdstanpy import CmdStanModel, set_cmdstan_path, cmdstan_path

useage = """\
run_chain.py <cmdstan_path> <model_exe> <seed> <chain_id> <output_dir> (<data_path>)\
"""

def main():
    if (len(sys.argv) < 5):
        print("missing arguments")
        print(useage)
        sys.exit(1)
    a_cmdstan_path = sys.argv[1]
    a_model_exe = sys.argv[2]
    a_seed = int(sys.argv[3])
    a_chain_id = int(sys.argv[4])
    a_output_dir = sys.argv[5]
    a_data_file = None
    if (len(sys.argv) > 5):
        a_data_file = sys.argv[6]

    set_cmdstan_path(a_cmdstan_path)
    mod = CmdStanModel(exe_file=a_model_exe)
    fit = mod.sample(chains=1, chain_ids=[a_chain_id], seed=a_seed, output_dir=a_output_dir, data=a_data_file)
    print(fit)

if __name__ == "__main__":
    main()

nerpa · August 26, 2020, 6:21pm

I am so sorry, but now I am even more confused. Based on the thread I understand that MPI + reduce_sum won’t work? Is it really so?

Thanks! I just tried it with 120 cores and many cores failed to compile the model at the same time. So perhaps a better idea is to pass a compiled model :)

wds15 · August 26, 2020, 6:51pm

Yes, we do not have a mpi backend for reduce sum. You can only split work with map rect over mpi and then nest reduce sum into that.

nerpa · August 26, 2020, 6:53pm

Is it something that might me implemented anytime soon? My big hope was to run MPI and reduce_sum.

wds15 · August 26, 2020, 6:56pm

I don’t think so.

nerpa · August 26, 2020, 7:37pm

Thanks!

nerpa · August 26, 2020, 9:24pm

Not sure if it makes sense, but is it possible to use MPI for parallelization instead of reduce_sum then? How would this work? If within chain parallelization with reduce_sum cannot be used with MPI, is the idea of MPI to run dozens of chains in parallel? Or using MPI makes sense only with map_rect?

mitzimorris · August 27, 2020, 2:27pm

using MPI only makes sense with map_rect, as that’s what’s needed to spread the computation across processes.

if you can use reduce_sum to compute what you want to compute, that amount of parallelism might be good enough.

nerpa · August 29, 2020, 10:13pm

Thanks! It is true in most cases for reduce_sum, but without MPI, reduce_sum is limited to the number of cores in a single node, which might not be enough. I guess it is going back to map_rect for me.

Topic		Replies	Views
Running cmdstanr in parallel on computing cluster General	6	1010	December 9, 2022
CmdStanPy and multithreading Modeling	10	1274	June 27, 2024
Multiprocessing and/or multithreading problem - CmdStanPy Modeling cmdstanpy , paralellization	12	111	January 2, 2025
Cmdstanpy, mpi speedup Developers	26	266	November 19, 2024
Cmdstanpy reduce_sum doesn't produce executable Developers compiler	22	753	January 2, 2024

Correct way to use MPI with cmdstanpy

Related topics