With MPI we have to cut down on communication and therefore data shall not be sent multiple times. As I learned from you guys whenever something has the type "data" it is not sufficient to assume that this thing may not change. Only data which has been declared in the transformed data block can be assumed to static data.
In my second item, where I speak of "in transformed data block scope", I was asking if we can restrict our first map implementation to deal with static data only. As I understood, you can detect in the parser that this is the case.
BTW, assuming static data is not enough, we also need to assume to have a fixed number of parameters sets for each evaluation of the function. So the N parameter sets are always mapped in a fixed manner to the M workers. This is an assumption in my prototype as well and makes my life a little easier. If we go to the all-data-to-all-workers design, then this assumption is not needed any more.
In my current prototype the model is setup on the root and on the workers. On all nodes the the program will enter the transformed data block and eventually call the
setup_mpi_function, see here
This setup function will only return on the root while on all other nodes the program goes into a loop where it waits for parameters to process. The upside of such an approach is that the data is being transferred to the worker nodes automagically.
transformed data stuff available on the worker nodes as static data which the user can access in the mpi worker function would be very convenient. Would this be doable now that we have closures with C++11? I know that discussion were around this in the past, but we always decided against it.
What we need is a client / server structure where the server sends blocks of parameter sets to the clients and the clients return per parameter set the function value and the gradients. The server is the root process and the clients are the workers.
Splitting things between math and stan makes things a bit more involved, yes. From the math library perspective we want to calculate in a distributed way the gradient of a function with some parameters. Not sure yet how, but if we could leave the data handling bit out of what the math library has to do using closures, that would make things possibly easier. So the function signature of a functor given to math is
real foo(real params) (or for the index design
real foo(real params, int i)). My thinking is to have in math a singleton facility where we can register such functor objects which essentially follow the "Command Design Pattern" as described in Modern C++. Once things are registered we can do repeated evaluations of the functor with varying parameter sets.
... stuff can be done at a later stage. I was hoping that the new C++11 features make this easy to deal with and would allow nicer syntax, but we can just defer this as it would anyway be a generalization.