I’ve got a set of ODE models I’ve been developing that simulate crop growth and development over time. I’m now interested in using them to analyze larger datasets, but the run time is slow given the relatively high computational cost of the numerical integration of the ODE system. To address the bottleneck, I have already implemented my own function for performing Euler integration with a fixed time step and accuracy seems fine. I have also implemented analytic gradient calculation for the ODEs in external C++. I am now at the point where parallelization seems to be the next step for reducing runtime.
I have used map_rect to parallelize calculations before, but these models are strongly forced by several input data sources with varying structures and the requirement of packing/unpacking all the input data into one array has become burdensome. I am considering writing my own code using MPI and wanted to ask for pointers on where to start.
At this point, I was thinking to handle all this in custom functions written as external C++ code.
I have looked into the code in map_rect_mpi.hpp, mpi_cluster.hpp, etc to get an idea of how map_rect+MPI works. Still trying to wrap my head around where MPI is initialized, data transferred, etc.
The access routine that controls the MPI flow is mpi_parallel_call, which is called by map_rect_mpi that you have located. The comments there should give you a general idea how it’s set up.
On the other hand, if avoiding packing/unpacking is the only motivation you may want to check out the multithreading through reduce_sum func instead of MPI. It addresses the ergonomic flaw of map_rect by allowing in variadic args. But it won’t help you if by “varying structures” you mean some special c++ construct you use in custom ODE solver.
I will take a look at mpi_parallel_call and see what I can glean from the comments.
Regarding reduce_sum, it doesn’t quite fit for what I’m looking for in this case, but I do appreciate the more flexible interface. Is there any a priori reason why map_rect could not be extended in a similar way?
I would guess that the data transfer to MPI child processes would be more complicated. Is there anything else I’m missing there?
I think that’s about it. You just need to figure out how to transfer the data. The relevant boost technique can be found here. See its use in Stan in struct mpi_command in mpi_command.hpp.
I have read over everything now. My brain feels like it has just finished Thanksgiving dinner and needs time to digest everything. Suffice it to say, I am still ruminating on how best to proceed.
By the way, while on my reading quest to get up to speed, I stumbled across the MPI Design Discussion topic. I thought I would link to it here since it has a fairly comprehensive (albeit lengthy) discussion on the thought process behind the current MPI implementation.
@palderman slightly off topic, but how much a speedup did you get implementing your own forward Euler solver versus using the solvers in Stan? Was the speedup because the Euler is solving a looser precision? Do you still get reasonably similar posteriors? And then how much of a speedup using a custom gradient for the forward Euler versus just autodiffing through it?
Hi @arya, those are all good questions. I haven’t actually tried to use the built in Stan ODE integrator functions. For accuracy comparisons I was comparing against an RK4 integrator with a fixed time step that I coded myself. I suppose it would be worth comparing at some point. However, the models I am working with are forced by multiple daily weather variables and I couldn’t figure out a convenient way to get the data in and aligned with the right time point within the Stan rk45 integrator.
Hi @yizhang, I’m trying to do the cost benefit analysis on whether to
Implement something custom as I set out to do
Work on extending the code for map_rect to allow variadic arguments
I think I’m pretty clear on the next steps if I go with option 1.
If I decided to try option 2, I was thinking to start by sketching out some design specs and writing corresponding tests. What is the standard convention/protocol for developing something that would seamlessly integrate with on-going development efforts? Do I just fork the repo and start coding? Should I start a new discussion topic to get input from other Stan developers?
Thank you for considering contributing! For a final PR one will definitely need a fork, and yes it’s always a good idea to hash out a design on the forum. In addition, tho it’s not frequently done I’d recommend have a design doc ready before full-scale development, considering the scale of this particular project. See https://github.com/stan-dev/design-docs/pull/28 for a related example.
I have actually started working on a variadic map_rect type of thing. I just pushed the branch:
The idea is simply to piggy-pack the variadic thing onto the existing map_rect. This means to serialise the variadic arguments in a clever way so that the format can be send over map_rect.
I have barely time to work on it…if you want to join or take over - go for it. I would then make a separate post on this to dump my thoughts on this in a bit more detail as to how I think this should be tackled.
I would love to collaborate on the variadic argument version of map_rect. I, too, am somewhat time-constrained, but maybe between the two of us we can make some progress. :-)
How about if you create a separate post on the topic and mention me there so I can find it?