Fully working parallel ODE Stan using OpenMP

Hi!

I never thought to get this going so quickly, but I just managed to fit a Stan model which solves an ODE system on multiple CPUs in parallel within a chain… and the speedup is linear in the number of CPUs (I used 4 CPUs). To pull this off, I had to do two things (i) make the ODE integration (and its sensitivity ODE system) a double only operation and (ii) I had to come up with a trick to still be able to do dosing events which are discontinuous injections into the system.

Now, what I have done requires the user to provide the Jacobian of the ODE wrt to the states and the parameters. Moreover, as the ODE facilities in Stan are not tailored for this, I had to hijack the integrate_ode_bdf function.

So to make this available to users there is some work needed:

  1. Refactor the ODE system such that ODE integration does not touch the AD stack whenever Jacobians are given
  2. Provide a facility to give the Jacobian (either via the Stan language or by code generation approaches)
  3. It would be nice to have ragged arrays, but this is optional

For applications in Pharma this is a major step forward. I propose to discuss the matter at the next Stan meeting.

Best,
Sebastian

1 Like

Awesome!

Remember when we were talking about the design for providing Jacobians for
ODEs? This is the use case I had in mind. I’ll take the hit of some speed
if we can make it easy to write functors that wrap ODE functions and can be
replaced with specific ones that don’t use autodiff.

I will throw probably later this week this approach onto one of our 30 core machines here. I have problems which take 3 days to compute when using ODEs… Making Stan scale-able to multiple CPUs is for me a breakthrough as we have multiple machines with 30 cores around here. I used so far an (cool) approximation to get 3 days running time down to 4h, but now I am optimistic that with brute force computing power one can solve things in equal time - exactly.

I am not sure what you mean, @syclik, with speed hit - there is no performance hit at all when using my approach. Providing the Jacobian information makes everything faster; providing more CPUs makes it even faster.

As we have now the use case for which our refactoring was targeted, I hope that we can make the refactoring an efficient process as its clear what the goals are and how the final application must look like. I suggest to discuss on Thursday should we have time for this.

Sebastian