Coupled ode system in cmdstan 2.18


#41

Why do you think they’re necessary? So there aren’t multiple function calls?

This is a line I’d really rather not cross. It completely upends the semantics of Stan as it stands now.

I’d rather wait until we have a proper tuple/struct implementation (maybe a year out).

Not really. We took a heavily safety-first approach. But I think we’re now a lot better at C++ than we used to be, so it’s wroth re-examining. In-place operations and noalias can be tricky to work together. It’s precisely that x *= y case that’s so tricky when x is part of y.

This needs to go along with auto intermediates (though those are also tricky in Eigen) to prevent eager copies of expression templates and also more generic arguments and better code generation at the C++ level for Stan models.


#42

Well, I have written the prototype with maximal speed in mind. When now moving to a design with by value return of the Jacobian we will see how much this is of a penalty.

The noalias statement made the code noticeably faster. We should really then check the codebase for opportunities to given Eigen this hint.


#43

Sounds good. I just don’t know who has time to do it.


#44

Is it possible to use it somehow? Soon I should have another large ODE problem (this time with proteins) so would love to try the speedup out. I was just thinking that it could be approached similar to MPI - one has to compile cmdstan to enable MPI using MPI switch in make. Here we could use ODE switch.


#45

What would that flag do?

There’s a side effect of having multiple flags—the combinatorics of testing grows exponentially unless they’re provably independent.


#46

I think we already agreed on another function in stan like

integrate_ode_rk45_jacobian

or however we want to call it… we could even just add another overload to stan-math like this

  • integrate_ode_rk45 simple with fixed default tolerances
  • integrate_ode_rk45 with optional tolerances + max_steps
  • integrate_ode_rk45 with optional tolerances + max_steps + Jacobians

How about that?

@linas : To use my code… just grab that branch from the rstan repo. It should work with the current 2.18 cmdstan and rstan 2.18.2. The speedups are really cool I have to say.

(the biggest issue with the new functions is my own time resource, but may @syclik can give a hand?)


#47

Perhaps I am not doing the right way: I have one 2.18 cmdstan in folder cmdstan compiled with default make and the same version of cmdstan in folder cmdstanMPI compiled with MPI. I thought about having folder cmdstanMPIODE compiled with MPI and analytic Jacobian. From R it is very easy to switch to required mode (normal, MPI, MPI with ODE) as well as substituting appropriate make/local file. While in theory testing has to be done for all combinations I think that not all combinations are used - bit of engineering approach…


#48

Not that want to change what it is agreed upon. I think adding suffix jacobian is real great solution and no need to mess with make flags.


#49

@wds15, I think we can come up with a C++ design that works.

I think we can come up with a separate Stan language design. What I think would make sense is something like…

integrate_ode(ode_function, jacobian_function, ...)

Let me know what you think.


#50

This is another great option. Almost all optimization routines have a parameter for jacobian.


#51

Yes, a function like that makes sense. Just bear in mind that we need two jacobian functions, I think. One for the Jacobian wrt to the states and one wrt to the parameters.

By now I have compared a version of my prototype using passing things by reference vs passing by value. The result is 74s (by ref) vs 123s (by value)… which I find quite a lot of performance to pay for copying stuff around. To me this raises some concerns as to how useful such an extra function in Stan really is given that you can get so much more performance with a bit of C++. Is by reference passing at some point in time realistic in the Stan language?

Also, if we would add such functions, then we should probably think a moment what to refactor first. For example, the state decoupling can be pulled out of coupled_ode_system (and maybe the decoupling could be done during integration and also use en-block memory allocation?).

The comparison is between these codes for the by ref and by value thing:

I am quite amazed how much faster by reference here is!