A few more things came up and comments are welcome:
sticking the mpi-evaluate function into stan-math makes things a bit harder. My current prototype takes advantage of the fact that the workers run the Stan program in the same way as the root and hence the data is being “transferred” for me in an implicit way. We will loose this elegant solution and need to worry about sending data from the root to the workers. Ok by me, just more effort.
Can we go with an mpi thing for now which can only be called in scope of the transformed data block and when we are sure that the data hasn’t been messed around after the transformed data block? In this case, we can always cache the data on the workers.
This mpi thing will be the first stateful function and I plan to do that with a singleton type design - other ideas?
Should we maybe go to a design where we have rectangular parameter blocks, but we do sent all data to all nodes and we also sent the index i of the job number? That would avoid the need for ragged arrays as this is pushed onto the user to deal with it. This design does not scale to huge datasets, but it would be a compromise until we have the ragged arrays.
Can we have variadic functions? My idea here is to do the following: The function which is passed by the user has signature as
real foo(real params, int i, ... data-arguments ...)
So, I leave the data arguments here to be flexible. For a given function from the user this is of course a fixed set of parameters. The … data-arguments… can be int, real or vectors of those. The Stan function itself would not be variadic, but the C++ implementation inside would have to deal with it (Stan parser and the mpi evaluation function in stan-math). C++11 has support for this kind of stuff as I understood.
Comments are very welcome.