MPI todo list / Stan language specifics

Hi!

This post is essentially for @Matthijs, but I thought others may find this useful as well.

You sounded interested in helping out with MPI which would be really great! So the short-term needs are:

  1. Getting the build stuff into stan-math – I think Daniel is on that
  2. Getting the serial (non-MPI) version into stan-math
    Here we need more tests to be defined for the serial version
  3. The map_rect needs to be made available in the Stan language parser
  4. Add the mpi cluster management to stan-math (more tests needed)
  5. Add the MPI map_rect version (again more tests)
  6. Add to the stan language parsers specialties which are required by MPI to the Stan parser:
    a. Ensure that the data passed into map_rect is static which means for Stan that it is defined either in data or transformed data. The static data is cached on the workers and only transmitted a single time.
    b. Ensure that for each map_rect call there is the respective macro being called which sets up matters needed for the MPI communication.

The above steps bring us the rather user-unfriendly map_rect function. The interface is not so nice to the user as he will be forced to squash all data into a rectangular format. Have a look at the cmdstan branch (call of the map_rect here: https://github.com/stan-dev/cmdstan/blob/ccf6977af300614aa2689cfee2afffecf3583931/examples/mpi/oral_2cmt_mpi4.stan#L461 and an example mpi_function in line 215). This branch works with the stan-math branch concept-mpi-2.

So the problem is that all the data declarations are being squashed and they need to be recreated in the mpi_function. However, doing that requires great care as defining any variable in a user function will put the stuff on the AD stack.

The second version of this function could be along the lines as described here.

The idea is that the user defines a functor with

  1. Shared parameters
  2. Job specific parameters
  3. An integer j which is the job number
  4. flexible number of data arguments.

The thing would be that the flexible number of data arguments would warrant that data definitions just stay as they are. The down-side is that we then cannot anymore slice the data for the user as we do it now. Hence, we need to pass into the function all data and an integer j which indicates which job to calculate. This job number then is used by the user to subset to the job-specific data.

I think this is possible to implement, but I can well imagine that we find some kinks to work around. However, this version would be much more user-friendly as definitions can stay as they were. If this last version is not clear let me know where you lost me and I am happy to fill in more details.

Thanks much for your help! MPI is making modeling so much more fun with Stan - it is really amazing.

Best,
Sebastian

Ok, so we are some steps further. I have created two pulls:

  • bring in boost MPI & boost build, see here

  • add map_rect in the non-MPI version to stan-math, see here

BTW, the serial map_rect version already gives quite some speedups for large models even on a single core when compared against a vanilla Stan implementation (the AD graph is build differently in this case).