DAE performance improvement

ryan-richt · February 17, 2022, 9:52pm

Hey folks! Super excited that DAEs and the integration with IDAS was added in the recent release by the brilliant @yizhang .
We’ve been putting it through its paces already with a large-but-reasonable DAE system (like ~200 states including initial values). As @wds15 mentioned on this thread a couple years ago, the adjoint method is really required to resolve the speed issues for large systems. Is this a priority for anyone right now? In looking to see if it’s something we could just take on internally, I think its a bit of a bear. Those on this thread are much more the experts. Would it help though if we were able to provide a small “feature bounty” in the form of dollars?

yizhang · February 17, 2022, 11:26pm

I’m lost on the “including initial values” part. Is the total number of equations 200, or is the number of parameters 200?

ryan-richt · February 17, 2022, 11:29pm

Stan parameters only like 5 for now, 200 equations indeed, sorry for the lack of precision there.

yizhang · February 17, 2022, 11:43pm

In that case I’m not sure how much dent adjoint solver can make. Potentially the speedup can be bought from other angles. Within the DAE solver there are things like matrix & nonlinear solver structure to exploit, and there are possibilities in the Stan modeling level. I understand the model is not to share, has it been run with a small number of iterations or even fixed param?

ryan-richt · February 17, 2022, 11:56pm

I guess it seemed to us in the code like forward mode basically has to compute this N x N matrix (so 200 x 200 here) of sensitivities that later get chain-ruled into the gradients of target that Stan cares about … Empirically, if we run it (IDAS-via-Stan) with Zero parameters, 1 Stan iteration is ~15 minutes for this model. But if we run that same model through IDA (as in, no-S-as-in-IDAS), it takes < 1 second, so we were thinking that those IDAS sensitivities are the source of the slowdown. And reverse mode is supposed to speed that up from N^2 → N?

yizhang · February 18, 2022, 12:15am

The forward mode actually solves an 1000x1000 system in addition to the 200x200 system, as for each state we need solve another 5 sensitivity ODEs. In the adjoint solver run we’d solve the 200x200 twice (with some other complications that could impact performance). So I don’t think the difference in IDA & IDAS performance is mainly caused by the sensitivity calculation, especially your IDAS run was not using any parameters.

It’s possible that the vanilla control I coded in Stan is not efficient in your problem. One thing your team can do is to replace the IDAS controls in Stan with what’s used in IDA to see if it makes a difference, as without any parameter sensitivity calculation, IDAS is idential to IDA.

Niko · February 18, 2022, 7:03am

Oh wow, that’s so much longer!

spinkney · February 19, 2022, 10:05am

Of course, you are free to fork the repo and add a bounty. In terms of the Stan-org, we have never tested this but I think it’s something we can discuss. Can you add a stan-math issue for what you want added to Stan and ask about adding a bounty in the issue? Are you thinking using https://issuehunt.io/? Or something else?

The other option is to pay one of the Stan-developers directly. I’m not sure we have a standard procedure for this, we should though!

@SGB I’ll add this topic of adding features by paying for developer time or bounties for discussion.
Edit: Issue is Funding Stan features · Issue #7 · stan-dev/sgb · GitHub

ryan-richt · February 28, 2022, 11:44pm

Thanks for the new funding issue. Cool!

Back in IDAS / DAE land…

In case anyone happens to know the answer, I think the part we’re sort of stuck on in implementing the adjoint sensitivity version of IDAS, is exactly how to implement equation (2.23) from sundials/idas_guide.pdf at 6ddce5d90084d8d1cbb8e12bb5a4402168325efe · LLNL/sundials · GitHub

It does look like this is parallel to the current CVODES implementation for Stan ODE adjoint mode (as opposed to, implementing equation (2.22), though that is also a question)

Specifically our question is that, (2.23) involves terms with both of \lambda and \lambda_T, but I think when we configure IDAS to run, that involves the choice of asking it to solve either (2.24) which gives us \lambda_T OR asking it to solve (2.19) which gives us \lambda, but it is not apparent to us how to translate between the two, at least not without a matrix inverse which is not defined in all cases.

(There is also the matter of how to compute y_p in general, but that’s not a showstopper for us right now)

It seems that term out front is important to get the derivatives of the Stan log density values correct w.r.t. the algebraic equations specifically, as the other tests pass. A version with a matrix inverse (which fails for other tests in general but…) for that first term of (2.23) does get us to within a factor of 2 for the algebraic equations’ tests, but no cigar.

yizhang · March 1, 2022, 12:59am

The whole point of adjoint sens is that we can avoid solving for the integrand in 2.19 directly if we obtain \lambda by solving 2.20 (or get 2.24 by solving 2.25 if dg/dt is seeked instead of dG/dt.) We achieve this by solving for y forward in time and \lambda backward in time. \lambda is a vector (one for each state equation) and there’s no matrix inverse involved (except internally in the solver when solving the DAEs).

From 2.23 you’ll see that there’s no need of y_p except at t=t_0, so we won’t be solving y_p.

I’m still curious about the cause of the performance difference between IDA and IDAS when no sensitivity is involved, in that case we are just solving for y in the forward pass so the 15min vs 1 sec observation is worth much investigation.

wds15 · March 1, 2022, 12:57pm

Have a look at the adjoint ODE design doc:

https://github.com/stan-dev/design-docs/files/6274031/0027-adjoint-ode.pdf

I did try to align CVODES and Stan-math notation as good as possible. Maybe that helps.

here is the PR:

github.com/stan-dev/design-docs

Adjoint ode design

stan-dev:master ← wds15:adjoint-ode

opened 08:37PM - 17 Feb 21 UTC

wds15

+423 -0

This PR discusses how the adjoint ODE approach can be integrated with Stan math.… The focus of the document is to - describe the merits of the approach - how the CVODES user manual notation lines up with Stan math notation - shows how the CVODES functionality is integrated with the Stan math autodiff library - proposes a super-set of tuning parameters for the numerical procedures which need to be assessed by the community in order to weed them out somewhat (or not at all) I am not sure on how to present a rendered version of this. I used this R command to render the text: ```R rmarkdown::render("0027-adjoint-ode.md", output_format="bookdown::html_document2") ``` Here is a rendered pdf of the document of easier reading: [0027-adjoint-ode.pdf](https://github.com/stan-dev/design-docs/files/6064565/0027-adjoint-ode.pdf) updated version 7. March 2021 [0027-adjoint-ode.pdf](https://github.com/stan-dev/design-docs/files/6097413/0027-adjoint-ode.pdf) update 23rd March 2021 [0027-adjoint-ode.pdf](https://github.com/stan-dev/design-docs/files/6192512/0027-adjoint-ode.pdf) update 7th April 2021 [0027-adjoint-ode.pdf](https://github.com/stan-dev/design-docs/files/6274031/0027-adjoint-ode.pdf)

Topic		Replies	Views
Adjoint ODE Prototype - RFC - Please Test General ode	41	2757	April 7, 2021
Adjoint sensitivities Algorithms	25	4737	July 23, 2019
Adjoint task force for ODEs Developers	15	1129	May 12, 2020
(Adjoint) ODE Benchmark Developers performance	2	506	October 13, 2021
Parallel ODE first steps roadmap Developers features	15	1539	February 20, 2017

DAE performance improvement

Related topics