Fitting ODE models with custom functions using MPI?

palderman · March 12, 2022, 11:04pm

Hi all,

I’ve got a set of ODE models I’ve been developing that simulate crop growth and development over time. I’m now interested in using them to analyze larger datasets, but the run time is slow given the relatively high computational cost of the numerical integration of the ODE system. To address the bottleneck, I have already implemented my own function for performing Euler integration with a fixed time step and accuracy seems fine. I have also implemented analytic gradient calculation for the ODEs in external C++. I am now at the point where parallelization seems to be the next step for reducing runtime.

I have used map_rect to parallelize calculations before, but these models are strongly forced by several input data sources with varying structures and the requirement of packing/unpacking all the input data into one array has become burdensome. I am considering writing my own code using MPI and wanted to ask for pointers on where to start.

At this point, I was thinking to handle all this in custom functions written as external C++ code.
I have looked into the code in map_rect_mpi.hpp, mpi_cluster.hpp, etc to get an idea of how map_rect+MPI works. Still trying to wrap my head around where MPI is initialized, data transferred, etc.

I see that @rok_cesnovar and @peterwicksstringfield (among others) have committed MPI-related code to the repo recently.

Is there anywhere else I should look or anything else I should be aware of going forward?

Is custom C++ code the way to go? Or would it be better to try to extend map_rect to allow passing multiple data structures as additional arguments?

Any input would be much appreciated.

yizhang · March 13, 2022, 2:43am

The access routine that controls the MPI flow is mpi_parallel_call, which is called by map_rect_mpi that you have located. The comments there should give you a general idea how it’s set up.

On the other hand, if avoiding packing/unpacking is the only motivation you may want to check out the multithreading through reduce_sum func instead of MPI. It addresses the ergonomic flaw of map_rect by allowing in variadic args. But it won’t help you if by “varying structures” you mean some special c++ construct you use in custom ODE solver.

palderman · March 13, 2022, 2:59pm

Thanks for the quick response.

I will take a look at mpi_parallel_call and see what I can glean from the comments.

Regarding reduce_sum, it doesn’t quite fit for what I’m looking for in this case, but I do appreciate the more flexible interface. Is there any a priori reason why map_rect could not be extended in a similar way?

I would guess that the data transfer to MPI child processes would be more complicated. Is there anything else I’m missing there?

yizhang · March 13, 2022, 5:08pm

Simply that we have no developer working on it.

I think that’s about it. You just need to figure out how to transfer the data. The relevant boost technique can be found here. See its use in Stan in struct mpi_command in mpi_command.hpp.

palderman · March 14, 2022, 1:57pm

Thanks for the pointers. I’ll read over everything you’ve shared and try to put together a scaled down example model that can serve as a test case.

palderman · March 15, 2022, 11:35am

I have read over everything now. My brain feels like it has just finished Thanksgiving dinner and needs time to digest everything. Suffice it to say, I am still ruminating on how best to proceed.

By the way, while on my reading quest to get up to speed, I stumbled across the MPI Design Discussion topic. I thought I would link to it here since it has a fairly comprehensive (albeit lengthy) discussion on the thought process behind the current MPI implementation.

arya · March 15, 2022, 5:58pm

@palderman slightly off topic, but how much a speedup did you get implementing your own forward Euler solver versus using the solvers in Stan? Was the speedup because the Euler is solving a looser precision? Do you still get reasonably similar posteriors? And then how much of a speedup using a custom gradient for the forward Euler versus just autodiffing through it?

palderman · March 16, 2022, 12:14am

Hi @arya, those are all good questions. I haven’t actually tried to use the built in Stan ODE integrator functions. For accuracy comparisons I was comparing against an RK4 integrator with a fixed time step that I coded myself. I suppose it would be worth comparing at some point. However, the models I am working with are forced by multiple daily weather variables and I couldn’t figure out a convenient way to get the data in and aligned with the right time point within the Stan rk45 integrator.

I know there are some others who have experimented with Euler integration (e.g. Costs/benefits of solving large ODE systems as systems of difference equations/in discrete time). You may want to post your question as its own topic. You might get a better response.

palderman · March 16, 2022, 11:17am

Hi @yizhang, I’m trying to do the cost benefit analysis on whether to

Implement something custom as I set out to do
Work on extending the code for map_rect to allow variadic arguments

I think I’m pretty clear on the next steps if I go with option 1.

If I decided to try option 2, I was thinking to start by sketching out some design specs and writing corresponding tests. What is the standard convention/protocol for developing something that would seamlessly integrate with on-going development efforts? Do I just fork the repo and start coding? Should I start a new discussion topic to get input from other Stan developers?

yizhang · March 16, 2022, 4:46pm

Thank you for considering contributing! For a final PR one will definitely need a fork, and yes it’s always a good idea to hash out a design on the forum. In addition, tho it’s not frequently done I’d recommend have a design doc ready before full-scale development, considering the scale of this particular project. See https://github.com/stan-dev/design-docs/pull/28 for a related example.

wds15 · March 17, 2022, 9:09am

I have actually started working on a variadic map_rect type of thing. I just pushed the branch:

github.com

stan-dev/math/blob/feature/issue-map-rect-variadic-proto/test/unit/math/rev/functor/map_rect_variadic_test.cpp

// the tests here check that map_rect_concurrent works correct as such
// we enforce that STAN_MPI is NOT defined

#ifdef STAN_MPI
#undef STAN_MPI
#endif

#include <gtest/gtest.h>
#include <stan/math/rev.hpp>

#include <test/unit/math/prim/functor/hard_work.hpp>
#include <test/unit/math/prim/functor/utils_threads.hpp>

#include <iostream>
#include <vector>

struct variadic_hard_work_1 {
  variadic_hard_work_1() {}
  template <typename T1>
  Eigen::Matrix<stan::return_type_t<T1>, Eigen::Dynamic, 1> operator()(

This file has been truncated. show original

The idea is simply to piggy-pack the variadic thing onto the existing map_rect. This means to serialise the variadic arguments in a clever way so that the format can be send over map_rect.

I have barely time to work on it…if you want to join or take over - go for it. I would then make a separate post on this to dump my thoughts on this in a bit more detail as to how I think this should be tackled.

palderman · March 17, 2022, 8:09pm

Hi @wds15,

I would love to collaborate on the variadic argument version of map_rect. I, too, am somewhat time-constrained, but maybe between the two of us we can make some progress. :-)

How about if you create a separate post on the topic and mention me there so I can find it?

Topic		Replies	Views
Efficient use of map_rect for parallelizing ODE systems CmdStan	10	776	April 28, 2020
Map_rect causing substantial slowdown; trying to understand how to fix Modeling cmdstanr , paralellization	13	882	July 24, 2020
Understanding MPI and hierarchical models Algorithms	6	951	December 15, 2019
Estimating resources needed on a cluster General ode , paralellization	4	456	December 17, 2020
Extending reduce_sum to use MPI Developers	5	486	June 1, 2021

Fitting ODE models with custom functions using MPI?

Related topics