Yes, that’s correct. However the thread_local feature of C++11 will cost you about 10-20% performance (when comparing the single-core runs).
I haven’t compared yet MPI vs threading on a single machine.
Just one clarification: There is only a single map_rect which will use MPI or threading - whatever is enabled. MPI is given preference over threading in case both are enabled (though this hasn’t been tested in the wild when both are on).
We are very close to getting all into develop. I am right now fighting some build issues, but we are really close, I would say.
If you are keen on using MPI, just use the cmdstan branch feature/issue-616-mpi… but please wait a moment as this branch is broken as of now since I need to fix some of the makefiles which reside in stan-math. The PR for that is hoepfully going in soon and then we will have cmdstan ready as well… at least that is the plan.
If you want to try this out now, better stick with the thread stuff which you can use already now using the develop branch of cmdstan. Once the MPI branch above turns “green” in terms of testing (you can get that status from the PR page), then you may want to switch to that.
I am not sure on what is faster… for MPI all parameters and all results need to be copied between the different processes. For threading this communication “friction” is almost not there. So I am certainly curious in seeing benchmarks.
That’s not quite what we do on Travis (maybe also look at the Jenkinsfile to see how MPI is setup in automation). Have a look here (note that the make/local change to stan-math need to be done into the make/local for cmdstan):
You really only have to put into make/local:
this setups up the build system. In case MPI was build on your system with a compiler you don’t like, then do
However, changing the compiler like this is something I cannot really recommend to do as then Stan uses a different compiler than what was used for MPI. Although we do test stan-math this way (thus it seems to work).
Then you just start your stan program foo with
mpirun -np 2 ./foo
and that’s all there is (in theory). Let me know should you run into trouble.