I am trying to follow up on the MPI development so far, I understand this is still not released but i would like to try MPI stan + cmdstan.
However i am a little lost as to what active branches to pick up for the recent development, Can someone help me through the steps required to setup MPI Stan with cmdstan?
However, as mentioned on the wiki, if you’re working on a single machine you can get similar results using threading relatively painlessly using the development branch of cmdstan: https://github.com/stan-dev/math/wiki/Threading-Support.
The notes on threading only mention that the autodiff stack is thread local. Does this mean that threads accessing the same data will reuse the same cache lines?
edit by data I mean the variables declared in the data section of the Stan model.
Yes, that’s correct. However the thread_local feature of C++11 will cost you about 10-20% performance (when comparing the single-core runs).
I haven’t compared yet MPI vs threading on a single machine.
Just one clarification: There is only a single map_rect which will use MPI or threading - whatever is enabled. MPI is given preference over threading in case both are enabled (though this hasn’t been tested in the wild when both are on).
We are very close to getting all into develop. I am right now fighting some build issues, but we are really close, I would say.
If you are keen on using MPI, just use the cmdstan branch feature/issue-616-mpi… but please wait a moment as this branch is broken as of now since I need to fix some of the makefiles which reside in stan-math. The PR for that is hoepfully going in soon and then we will have cmdstan ready as well… at least that is the plan.
If you want to try this out now, better stick with the thread stuff which you can use already now using the develop branch of cmdstan. Once the MPI branch above turns “green” in terms of testing (you can get that status from the PR page), then you may want to switch to that.
I’d guess MPI wins on speed but threads on ease of setup of course. It’s always good to have choices. I’d like to use this on a Xeon Phi system, so I’ll try both.
I am not sure on what is faster… for MPI all parameters and all results need to be copied between the different processes. For threading this communication “friction” is almost not there. So I am certainly curious in seeing benchmarks.
Took longer than anticipated, but if you checkout the cmdstan branch feature/issue-616-mpi now, then you will get a fully working MPI cmdstan (don’t forget make stan-update).
just to clarify, based on the .travis.yml on that branch, with a fresh checkout, I should
make CXX="mpicxx" STAN_MPI=true build
(assuming g++ is the only available compiler) and then assuming a model file map_rect_model.stan using the map_rect function, it should be compiled also with MPI options,
make CXX="mpicxx" STAN_MPI=true map_rect_model
and run with mpirun
mpirun -np 2 ./map_rect_model sample $options
?
(sorry if this is documented or otherwise obvious)
That’s not quite what we do on Travis (maybe also look at the Jenkinsfile to see how MPI is setup in automation). Have a look here (note that the make/local change to stan-math need to be done into the make/local for cmdstan):
You really only have to put into make/local:
STAN_MPI=TRUE
CC=mpicxx
this setups up the build system. In case MPI was build on your system with a compiler you don’t like, then do
However, changing the compiler like this is something I cannot really recommend to do as then Stan uses a different compiler than what was used for MPI. Although we do test stan-math this way (thus it seems to work).
Then you just start your stan program foo with
mpirun -np 2 ./foo
and that’s all there is (in theory). Let me know should you run into trouble.