Hello everyone!
I wanted to briefly summarise the results from the experimental phase of the adjoint ODE feature before we now start to integrate it into Stan in its first form.
From what I have observed we got this feedback:
-
@Funko_Unko reports that the adjoint method is quite competitive even for relatively small ODE systems. The performance crucially depends on the tolerances set for the backwards solve. Using looser tolerances for the backwards solve either leads to failure of the sampler to work or you get better performance from it. => Sounds to me as if we need the backward tolerance controls and it’s good that things break down if one pushes things too far (rather than giving grossly wrong results).
-
@charlesm93 tried out the adjoint method on a SIR type model, but hasn’t seen any noticeable speedups as I understood. That’s fair. The method is not meant to solve all problems better than the existing forward ODE solvers. It was not clear to me how much experimentation was done?
-
@jriou sounded very excited about large speedups (10x), but he also reported on cases where things derailed. I hope that it’s a similar story as for @Funko_Unko in that the integrator does not lead to posteriors which look ok when they are not. More experiments are planned as I understood.
-
@jtimonen raised a few points as to how comparisons should be done. I am not sure if any results were already ready yet.
In addition we have in the meantime a very mature design doc for the adjoint solver. There is an ongoing discussion brought up by @betanalpha in that the current interface misses out on enabling per-parameter absolute tolerances. @Funko_Unko seems to miss the feature as well. Given that we don’t have any evidence that this is truly a needed feature for the use cases we aim for, I am much in favour of leaving this feature out and live with the simplification that for the backward quadrature integration only a scalar absolute tolerance (as opposed to a per-parameter vector absolute tolerance) can be specified. Personally I have not seen such a vector absolute tolerance being used either in CVODES nor in DifferentialEquations.jl such that I would deem this feature not necessary for a very broad class of problems. So far, the problems explored are also not limited by the lack of this functionality (and one may actually scale parameters and states to emulate this functionality).
In the course of the design doc discussion @betanalpha felt strongly that we should not provide a simplified ode_adjoint_tol function call for the moment being. This would be pre-mature at this point in time where we don’t yet have enough experience with setting up the defaults for the additional control arguments. As a compromise we will include in the stan user documentation a suggested starting point for the additional control parameters which go beyond the usual tolerance control parameters of the existing ode solvers.
I hope the summary is complete and correct. In case things need to be added, please post here.
From my view we can conclude that the proposed interface ode_adjoint_tol_ctl is good as is. So no additional option is needed nor should we drop any option from the signature.
As a next step the design doc should be finalised which I plan to do next. Then we can hopefully be the end of next week approve and merge the design doc (16th of April). In the meantime the adjoint ODE code will be made fit for merging into Stan-math.
Tagging a few more ODE folks to make sure it pops up their notification lists @bbbales2 @yizhang @syclik @rok_cesnovar .
Thanks to everyone putting time into this! Hopefully this thing will give us time back in the future by running things much faster…