Yes, that’s correct. However the thread_local feature of C++11 will cost you about 10-20% performance (when comparing the single-core runs).
I haven’t compared yet MPI vs threading on a single machine.
Just one clarification: There is only a single
map_rect which will use MPI or threading - whatever is enabled. MPI is given preference over threading in case both are enabled (though this hasn’t been tested in the wild when both are on).
We are very close to getting all into develop. I am right now fighting some build issues, but we are really close, I would say.
If you are keen on using MPI, just use the cmdstan branch
feature/issue-616-mpi… but please wait a moment as this branch is broken as of now since I need to fix some of the makefiles which reside in stan-math. The PR for that is hoepfully going in soon and then we will have cmdstan ready as well… at least that is the plan.
If you want to try this out now, better stick with the thread stuff which you can use already now using the develop branch of cmdstan. Once the MPI branch above turns “green” in terms of testing (you can get that status from the PR page), then you may want to switch to that.