MPI is to me one of the greatest features to come to Stan and I think we are ready to make it happen soon. Here is my proposal for moving forward in a step-wise manner:
map_rect function should by now be final unless @Bob_Carpenter has any additional comments/concerns. The serial version should be included into stan-math asap. issue #686 on stan-math.
map_rect exposing to the Stan language. This requires the parser to handle the user supplied function in the usual special treatment. issue #2440 on stan
Inclusion of MPI base system which includes only the basic mechanisms to send commands over the network. This should include a solution to test MPI code. Hence, I would expect that this pull also comes with changes to the test system to make it possible to test MPI enabled code.
Inclusion of MPI enabled map_rect in stan-math
Inclusion of further parsers changes needed to make things work (boost macros to ensure that MPI communication can take place). This needs to go into Stan.
Any comments on the above would be appreciated; I hope these steps are clear in what they mean. Let me know if not or if we should split things up further.
The MPI code can all be hidden in stan-math, yes. However the stan parser needs to generate additional code which is required to register the user supplied functions with the boost serialization library. The extra generated code can be placed inside a #ifdef STAN_HAS_MPI block to avoid additional MPI dependencies when compiling without MPI. See here for the needed definitions:
There is one bonus complication here, which is that the types registered with the boost serialization library must not be longer than 128 chars which is a limitation imposed by boost serialization (this is why I defined a mpi_callstruct outside of the model namespace in the snippet above).
Bump. I just figured how we can hide most of the needed declarations in neat C preprocessor macros. So all what needs to be generated is
STAN_REGISTER_MPI_MAP_RECT(mpi_call, double, double)
STAN_REGISTER_MPI_MAP_RECT(mpi_call, var, var)
and in addition a struct with a short name in the top-level namespace to make sure that we don’t blow the boost serialization limitations. I hope everyone is fine with having C preprocessor macro definitions in stan-math headers…
One way which would allow to get rid of all dependencies of stan from MPI stuff is to work with macros. So the stan parsers generates for each map_rect call it sees a call to a macro, let’s say STAN_REGISTER_MAP_RECT.
The definition of that macro will be done in stan-math. Depending upon availability of MPI different things will be defined for it. The non-MPI version will probably do nothing while the MPI version will register all necessities with the boost serialization system.
I don’t know what properties are, but to me the macro solution is fine. The macro solution comes at the burden to include headers a controlled order.
An alternative to macros would be to translate stan files into two separate header files. The first one being the usual one and the second one a header file which adds the MPI specific bits needed for compilation. The MPI enabled cmdstan would then need to pick that up in some clever way.
I thought about this one a bit and made the choice for the current design for these reasons:
the ODE functions would suggest to use nested arrays for parameters and data, yes; however, the algebra solver uses Eigen vectors for the parameters and the output of the function. So the algebra solver is inconsistent with ODE stuff and I figured the algebra solver is the more modern one, I thought.
The code will likely get used a lot in hierarchical models. Then you will very likely like to put a multi-variate normal prior on the job-specifc parameters. The array of vectors (vector) allows you to do just that.
Efficiency: Internally I do cast everything what is send over MPI into bigger Eigen matrices. Having the user-supplied per job function take Eigen vectors as input and getting Eigen vectors as output reduces the need for converting arrays into vectors and vice versa a lot. The reason to only ship Eigen matrices over MPI is driven by the observation that this gives much better performance. MPI can deal with continuous chunks of memory much better than with anything else (with nested arrays you have no guarantees on memory layout). Using Eigen matrices makes my life a lot easier when coding all of this.
the data arguments need to be nested arrays as I want to process int and real in the same way. As integers can only be handled by nested arrays, I then have to also do the same with the real data. Doing the conversion of these into Eigen structures only needs to be done once such that the array to Eigen and Eigen to array conversion only happens a single time.
The MPI resource management pull is up. Right now it’s still WIP since the MPI build stuff has not been merged yet, but if you want to get an idea of the MPI resource handling which I am planing for, then have a look.