Parallel ODE first steps roadmap

wds15 · February 10, 2017, 8:35am

Hi!

Thanks for the discussion yesterday. I took from the meeting that we all agree that parallel ODE stuff is great to have for all users, but there are potentially big decisions to make at some point. Since at this stage it is clear what steps need to be taken to get into the direction we want with the ODE system we can start to get moving already now.

Short term goal is to refactor the ODE system for which Daniel and Charles want to join me. Thank you both, I suggest that I create an experimental branch on stan-math where I dump a working prototype in the form of a test into that branch. I hope this will aid our efforts in designing and writing things. In addition I will start a wiki for that early next week.

Additionally I will create a parallel ODE design spec wiki page which will list the more longer term goals of this initiative.

So thanks for everyone contributing! This is to a me a major advancement of Stan and if this flies then it will unlock a huge potential of Stan and will likely trigger lots of more uses of Stan, i.e. building the ground for more projects in industry relevant contexts.

Sebastian

syclik · February 10, 2017, 12:39pm

That sounds great. Please put a prototype on a branch. Since we don’t get notifications on new branches, can you let us know when it’s up? Thanks!

charlesm93 · February 10, 2017, 7:42pm

+1 to everything said. Excited to work on this!

charlesm93 · February 10, 2017, 8:30pm

@wds15: you mentioned during yesterday’s meeting you had created a dosing function, to avoid interrupting the ODE integration. I had a few questions about it. First off does it make a difference to run the ODE without interruption? Algorithmically speaking, it seems it should the same number of steps, whether there are interruptions or not.

Secondly, what does this dosing function look like? The only thing I can imagine would be some code inside the ODE system which uses conditional statements to add, say, a bolus dosing at certain times. I’m not sure how practical this approach would be, and I’m assuming you came up with something better.

wds15 · February 11, 2017, 4:06pm

Bump.

The first prototype is now on the branch feature/proto-parallel-ode on stan-math. I have setup things as a test, so you want to look at this:

github.com

stan-dev/math/blob/9069b3255ef9ad7edb5949aba00ed45fa3ccf18d/test/unit/math/rev/mat/functor/integrate_ode_parallel_proto_test.cpp

/*
 * Parallel ODE Prototype
 *
 * Note: We need an OpenMP capable compiler. clang on MacOS is not
 * openMP capable while g++ is. So make sure you use in the make/local
 *
 * CC=g++
 * CFLAGS += -DNDEBUG -fopenmp
 * CFLAGS_GTEST += -DNDEBUG -fopenmp
 *
 * Also, its recommended to export the variable on the shell
 *
 * OMP_NUM_THREADS=4
 *
 * to set the number of used cores to 4 should you have 4 CPUs on your
 * machine.
 */

#include <stan/math/rev/mat.hpp>
#include <stan/math/rev/core.hpp>

This file has been truncated. show original

Please make sure that you use g++ with the -fopenmp compiler option to make OpenMP work. In case you do not use a compatible compiler, you won’t get any speedup, but the code will still work.

At the moment this approach only works with the stiff solver which is what I would like to change with the refactor.

@charlesm93: I have turned the dosing into a continuous function and what I do is a compromise and in fact an approximation to a bolus. Have a look at the oral_2cmt.hpp where I demonstrate it. Essentially what I do is to use the fact that I can represent dosing as a sum over delta-peaks. While delta-peaks are discontinuous, we can approximate them with a narrow normal density. So multiple dosing will be represented as a mixture of normals which I find quite cool. This makes dosing continuous and puts a bit of stress onto the ODE solver, but the convenience gain is enormous as I can then parallelize the integrator over patients and thats it. You can improve on that, yes, by making a function which is parallelized and takes care of dosing, yes, but this is a lot more tedious and I do not think that this is a function which we want to see in the base Stan, i.e. an integrate_ode_parallel_dosing - but you can do this in Torsten, of course. I haven’t yet explored fully this approach, but first tests are very promising. Moreover, this approach enables you to deal in a straightforward way with lag times, since dosing is a continuous function now! And the extra stress on the solver is made up for with less cvodes inits and brute force parallelization over the CPUs.

I will start the wiki early next week.

wds15 · February 11, 2017, 5:38pm

@charlesm93 I forgot about one more major advantage of representing dosing as I am proposing: The size of the sensitivity system is dramatically decreased! So when we solve an ODE with N states and P parameters, we have to get the sensitivities in addition. That means that per sensitivity parameter we have to solve N more ODEs. Since we need the sensitivities for the P parameters and the N initial states whenever we integrate from dose to dose we end up having to solve

N * (1 + P + N)

ODEs in total. This make, for example, in the case of a oral 2cmt dosing (N=3, K=4) a whopping 24 ODEs to solve!

Now, when I do not integrate from dose to dose, but instead represent dosing as a continuous function which is fixed by data only, then I do not need to treat the initials as varying and hence the ODEs to solve reduce to

N * (1 + P)

So I save N^2 equations!! This makes for the oral 2cmt case a reduction to only 15 ODEs. That’s a major speedup just there.

And BTW, there is a very simple, but effective trick to force CVODES to stop at each dosing event: Simply request a solution to be output at that point in time.

charlesm93 · February 14, 2017, 2:04pm

@wds15 This is indeed a very cool approach and I’m eager to take a closer look. I can certainly appreciate how convenient it would be to deal with a continuous function, when we work with tlag parameters or any time dependent parameters for that matter.

I’m unclear about the second advantage. I need to read up on sensitivity systems, but intuitively, it seems that, while you reduce the number of ODEs, each solution spans a greater time period and it takes the integrator more steps to build the solution function. I’m guessing the speedup comes from the fact we only calculate the sensitivity once instead of N times – and if we interrupt the integrator, there is no way to pass the sensitivity calculations from one ODE solving to the other.

syclik · February 14, 2017, 2:07pm

@wds15, we’re missing at least one file on the branch:
test/unit/math/rev/mat/functor/integrate_ode_bdf_bare.hpp

I was trying to run the
test/unit/math/rev/mat/functor/integrate_ode_parallel_proto_test.cpp test
and it failed to compile.

wds15 · February 14, 2017, 2:46pm

Hi!

@syclik: Sorry for forgetting that file. I will push it later and give you a heads up.

FYI: The “_bare.hpp” is doing exactly the same as the integrate_ode_bdf.hpp except that it does not do the decoupling operation, but returns y_coupled in double-only format.

@charlesm93: N^2 less ODEs is a huge advantage. Ok, the integrator has to slow down whenever he is in the vicinity of a dose, but the consequence is only a shorter step-size of the integrator in that range which will adaptivley again be increased. Also, since the normals will have almost vanishing density once you are more than 6 standard deviations aways, you can actually restrict the dosing function to be only non-zero if you are less than 6 sds close to any dosing event.

Sebastian

wds15 · February 14, 2017, 5:21pm

Bump.

I just pushed that missing file and verified on a clean checkout that things compile and run.

Sebastian

syclik · February 15, 2017, 4:07am

Can you describe what the test is doing? I spent half an hour trying to figure out the structure to line up gradients. It looks right, but maybe you can describe what it’s doing.

wds15 · February 15, 2017, 11:59am

I added a few more comments which hopefully help and pushed these to the branch.

The test does not “test” anything in the sense of checking whatever. Instead I abuse the test to show the performance of running with openmp as you get the walltime reported from google test.

I have established that the approach gives exactly the same when running in serial or parallel mode on a Stan model.

If you want we could have a hangout today to discuss, should you have questions.

charlesm93 · February 15, 2017, 2:49pm

If you have a hangout, I’d be interested in participating. If I get enough of a heads up on time, I should be able to work through the code.

wds15 · February 15, 2017, 3:22pm

Hmm… maybe its best if we three organize a hangout among us for early next week? Then you both should have sufficient time to go through the code and we can sync on how to proceed. Next week Monday starting time of the meeting between 8am to 11am your time would work for me. Let’s align by mail the details should this be a good idea.

FYI: I think that by tomorrow I have a fully auto-generated Stan model available. I mean I managed to define the ODE in R, then use R facilities to obtain the analytical Jacobians and dump all the needed defs into small files which you can then include in the Stan model and the C++. That means you can define the ODE in R and all other defs (the ODE itself in Stan, the analytical C++ Jacobian and whatever is needed) are auto-magically generated. The only restriction is that the D operator in R only supports a limited set of functions, but these are by far sufficient for the ODEs I run across.

Best,
Sebastian

charlesm93 · February 20, 2017, 4:38pm

I’m following up on the discussion this morning about running the unit test. After doing the modification recommended in the code, I got the following error at compilation:
ld: library not found for -lgomp.

The test ran after installing libiomp and setting

LDFLAGS = -lgomp

I think we should add this to the comments at the top of the unit test file.

wds15 · February 20, 2017, 5:02pm

It sounds as if you installed libiomp manually… what system are u on?

I just installed gcc6 from macports on my Mac and that was it.

It can well be that I forgot to doc the

LDFLAGS=-lgomp

flag for the linker. This links in openMP stuff; and yes this should be doced.

Any thoughts on the design?

Sebastian

Topic		Replies	Views
Within Chain ODE Parallelization Results Developers features	11	1612	March 22, 2017
Fully working parallel ODE Stan using OpenMP Developers	2	809	February 6, 2017
ODE refactor now has a path forward Developers maintenance	45	2043	October 20, 2016
Parallelism in Stan Developers	0	572	March 6, 2017
Adjoint ODE Prototype - RFC - Please Test General ode	41	2758	April 7, 2021

Parallel ODE first steps roadmap

Related topics