Thanks for the discussion yesterday. I took from the meeting that we all agree that parallel ODE stuff is great to have for all users, but there are potentially big decisions to make at some point. Since at this stage it is clear what steps need to be taken to get into the direction we want with the ODE system we can start to get moving already now.
Short term goal is to refactor the ODE system for which Daniel and Charles want to join me. Thank you both, I suggest that I create an experimental branch on stan-math where I dump a working prototype in the form of a test into that branch. I hope this will aid our efforts in designing and writing things. In addition I will start a wiki for that early next week.
Additionally I will create a parallel ODE design spec wiki page which will list the more longer term goals of this initiative.
So thanks for everyone contributing! This is to a me a major advancement of Stan and if this flies then it will unlock a huge potential of Stan and will likely trigger lots of more uses of Stan, i.e. building the ground for more projects in industry relevant contexts.
@wds15: you mentioned during yesterday’s meeting you had created a dosing function, to avoid interrupting the ODE integration. I had a few questions about it. First off does it make a difference to run the ODE without interruption? Algorithmically speaking, it seems it should the same number of steps, whether there are interruptions or not.
Secondly, what does this dosing function look like? The only thing I can imagine would be some code inside the ODE system which uses conditional statements to add, say, a bolus dosing at certain times. I’m not sure how practical this approach would be, and I’m assuming you came up with something better.
The first prototype is now on the branch feature/proto-parallel-ode on stan-math. I have setup things as a test, so you want to look at this:
Please make sure that you use g++ with the -fopenmp compiler option to make OpenMP work. In case you do not use a compatible compiler, you won’t get any speedup, but the code will still work.
At the moment this approach only works with the stiff solver which is what I would like to change with the refactor.
@charlesm93: I have turned the dosing into a continuous function and what I do is a compromise and in fact an approximation to a bolus. Have a look at the oral_2cmt.hpp where I demonstrate it. Essentially what I do is to use the fact that I can represent dosing as a sum over delta-peaks. While delta-peaks are discontinuous, we can approximate them with a narrow normal density. So multiple dosing will be represented as a mixture of normals which I find quite cool. This makes dosing continuous and puts a bit of stress onto the ODE solver, but the convenience gain is enormous as I can then parallelize the integrator over patients and thats it. You can improve on that, yes, by making a function which is parallelized and takes care of dosing, yes, but this is a lot more tedious and I do not think that this is a function which we want to see in the base Stan, i.e. an integrate_ode_parallel_dosing - but you can do this in Torsten, of course. I haven’t yet explored fully this approach, but first tests are very promising. Moreover, this approach enables you to deal in a straightforward way with lag times, since dosing is a continuous function now! And the extra stress on the solver is made up for with less cvodes inits and brute force parallelization over the CPUs.
@charlesm93 I forgot about one more major advantage of representing dosing as I am proposing: The size of the sensitivity system is dramatically decreased! So when we solve an ODE with N states and P parameters, we have to get the sensitivities in addition. That means that per sensitivity parameter we have to solve N more ODEs. Since we need the sensitivities for the P parameters and the N initial states whenever we integrate from dose to dose we end up having to solve
N * (1 + P + N)
ODEs in total. This make, for example, in the case of a oral 2cmt dosing (N=3, K=4) a whopping 24 ODEs to solve!
Now, when I do not integrate from dose to dose, but instead represent dosing as a continuous function which is fixed by data only, then I do not need to treat the initials as varying and hence the ODEs to solve reduce to
N * (1 + P)
So I save N^2 equations!! This makes for the oral 2cmt case a reduction to only 15 ODEs. That’s a major speedup just there.
And BTW, there is a very simple, but effective trick to force CVODES to stop at each dosing event: Simply request a solution to be output at that point in time.
@wds15 This is indeed a very cool approach and I’m eager to take a closer look. I can certainly appreciate how convenient it would be to deal with a continuous function, when we work with tlag parameters or any time dependent parameters for that matter.
I’m unclear about the second advantage. I need to read up on sensitivity systems, but intuitively, it seems that, while you reduce the number of ODEs, each solution spans a greater time period and it takes the integrator more steps to build the solution function. I’m guessing the speedup comes from the fact we only calculate the sensitivity once instead of N times – and if we interrupt the integrator, there is no way to pass the sensitivity calculations from one ODE solving to the other.
@syclik: Sorry for forgetting that file. I will push it later and give you a heads up.
FYI: The “_bare.hpp” is doing exactly the same as the integrate_ode_bdf.hpp except that it does not do the decoupling operation, but returns y_coupled in double-only format.
@charlesm93: N^2 less ODEs is a huge advantage. Ok, the integrator has to slow down whenever he is in the vicinity of a dose, but the consequence is only a shorter step-size of the integrator in that range which will adaptivley again be increased. Also, since the normals will have almost vanishing density once you are more than 6 standard deviations aways, you can actually restrict the dosing function to be only non-zero if you are less than 6 sds close to any dosing event.
Can you describe what the test is doing? I spent half an hour trying to figure out the structure to line up gradients. It looks right, but maybe you can describe what it’s doing.
I added a few more comments which hopefully help and pushed these to the branch.
The test does not “test” anything in the sense of checking whatever. Instead I abuse the test to show the performance of running with openmp as you get the walltime reported from google test.
I have established that the approach gives exactly the same when running in serial or parallel mode on a Stan model.
If you want we could have a hangout today to discuss, should you have questions.
Hmm… maybe its best if we three organize a hangout among us for early next week? Then you both should have sufficient time to go through the code and we can sync on how to proceed. Next week Monday starting time of the meeting between 8am to 11am your time would work for me. Let’s align by mail the details should this be a good idea.
FYI: I think that by tomorrow I have a fully auto-generated Stan model available. I mean I managed to define the ODE in R, then use R facilities to obtain the analytical Jacobians and dump all the needed defs into small files which you can then include in the Stan model and the C++. That means you can define the ODE in R and all other defs (the ODE itself in Stan, the analytical C++ Jacobian and whatever is needed) are auto-magically generated. The only restriction is that the D operator in R only supports a limited set of functions, but these are by far sufficient for the ODEs I run across.
I’m following up on the discussion this morning about running the unit test. After doing the modification recommended in the code, I got the following error at compilation:
ld: library not found for -lgomp.
The test ran after installing libiomp and setting
LDFLAGS = -lgomp
I think we should add this to the comments at the top of the unit test file.