Stan differential equation functionality development priorities

,

Yes, I was under the impression that that only worked for linear systems.

I was wondering whether people used that language in other fields, and I guess not. It’s the term that biogeochem folks use to describe running a system until it is perceived to be a steady state.

Local errors in a function expansion are different than global errors in an ODE solve.

Also the tolerances here aren’t always 1e-10 (or whatever tolerance pushes us into machine precision) when we do solver things.

1 Like

These numbers are coming from a long time ago, but I think they are reltol = abstol = 1e-4 for some solve.

A gradient computed with CVODES adjoint sensitivity was 44.40 The same gradient computed with forward sensitivity was 44.68.

After some more cursory reading on this, I think making this one-way could make the problem a lot easier. I shared a guide from Wolfram earlier about stiffness detection: Stiffness Detection—Wolfram Language Documentation. It addresses some of what @yizhang mentioned earlier about LSODA dropping stiffness detection because of the inaccuracy of using the norm of the Jacobian to bound the dominant eigenvalue. The Wolfram page lists fewer methods (just one) for stiffness detection to switch an algorithm to stiff mode, compared to non-stiffness detection, for which more options are given. The Mathematica “automatic” non-stiffness detection methodology consists of the following control flow:

  • For direct eigenvalue computation is used. Either or is used, depending on which method is active.

  • For subspace iteration is used with a default basis size of . If the method succeeds then the resulting basis is used to start the method at the next integration step.

  • If subspace iteration fails to converge after iterations then the dominant vector is used to start the Krylov method with a default basis size of . Subsequent integration steps use the Krylov method, starting with the resulting vector from the previous step.

  • If Krylov iteration fails to converge after iterations, then norm bounds are used for the current step. The next integration step will continue to try to use Krylov iteration.

  • Since they are so inexpensive, norm bounds are always computed when subspace or Krylov iteration is used and the smaller of the absolute values is used.

I definitely get where @yizhang is coming from with the large burden of verification and future maintenance that would come with implementing an ODE integrator natively in Stan math. And in many cases, the tried-and-true solvers from the established libraries are the way to go for sure. However, newer methods that are being validated over the past couple of decades aren’t being added to some of the older libraries, and some of the those methods could really benefit the simulation of emerging models with unknown stiff/non-stiff regimes. With the focus on differential equation systems in fields like system biology and the growing need of researchers in those fields to use PPL’s like Stan for robust model comparisons, perhaps ODE maintenance/validation could be factored in as part of Stan’s development upkeep?

@yizhang, are there newer C++ libraries that you trust that could be worth wrapping into Stan (here is one I found that seems more unvalidated/experimental)? And general question for everyone in this thread – does sufficient appetite exist for implementing some sort of switching solver or stiffness detection natively in Stan Math? I think that’s the critical question – if not, we should look at other C++ libraries that could be wrapped and most likely pivot away from discussing a switching algorithm for now. Otherwise, I’m definitely willing to at least learn more about stiffness and non-stiffness detection ahead of efforts to contribute on that front.

Looking in the long term, it would be a heavy investment, but expanded native differential equation functionality in Stan could really make it a selling point compared to TensorFlow Probability, PyMC4. As is, Turing.jl uses compatibility with DifferentialEquations.jl as one of the its main draws. Of course, a PPL would be hard-pressed to ever match a differential equations package in solvers with their different respective focuses, and neither should it because of time, manpower, and financial limitations; the prime development focus must always be on advancing the core HMC algorithm.

In the future, perhaps it would be more efficient resource usage to interface with a differential equations package with an FFI, since the addition of DDE, DAE, SDE, and PDE solvers that are not represented in existing C++/Fortran libraries are separate titanic issues. As an intermediate step for systems biologists, ecologists, climate modelers, astronomers, and the like though, a switching algorithm could be an intermediate step for easier plug-and-play ODE fitting; that would be easier than adding a different type of solver, but would still be useful.

If we feel that an ODE solver, much less other types of solvers, would be too troublesome to implement and maintain in Stan Math, the strategy should be oriented toward getting Stan to be able to call other differential equation packages. Labs across various fields are looking to perform Bayesian inference on an ever greater range of model types and will look elsewhere if they are unable to implement their models with the appropriate solvers in Stan, reducing potential Stan’s user-developer base. Folks can’t appreciate the XMC algorithm if they can’t write their model. At the end of the day, expanded FFI functionality and taking advantage of another package seems like it may very well take less effort than adding DDE, DAE, SDE, PDE, IDE, etc. solvers and would save on the maintenance; I’m not sure whether the most efficient route long-term is native implementation of solvers or addition of an FFI to Stan that allows calling of foreign packages (Sean Talts wrote a little bit about FFIs as one the possible areas of future focus in Stan in his “A Path Forward for Stan” writeup.

By the way, is the Adams-Moulton solver natively built in Stan math, or is it also from CVODE?

The non-stiff solver we have is Boost-RK45. I wonder if there’s a way to subclass one of the steppers or something?

It’s from CVODES. It’s the same code as BDF but with a template parameter flipped.

I’ve heard that used in computer science to talk about databases, which can take a long time to load all their indexes on startup.

How so? If that program with the conditional is broken, the one of the conditions must be broken, because the conditional doesn’t break anything beyond arithmetic tolerance of the functions. For example,

Exactly. The relative difference between those results is 6e-3 >> 1e-4. So it looks like one of the integrators is not respecting error tolerances. Which means it’s broken when used on its own.

The tolerances specified in an ODE solve are local so they look like Taylor expansions. What we’re talking about here is the error at the end of the integration, which can be something quite a bit different.

You can have situations where you pick a timestep that satisfies the local tolerance requirements but is out of the stability region of the integrator (so will blow up over a long period of time). So the single-step behavior is different than the many-step behavior in your integrator.

For geo simulation spin-up means letting the numerical model run from an initial condition & boundary condition that do not satisfy geothermal balance described by the mathematical model, which is always the case when those conditions are based on metocean data. Mathematically what happens is that the initial condition got attracted to the stable limit cycle of the mathematical model. This is a universal practice when the periodic solution of a dynamical system is desired, but the name “spin up” is unique to geo community.

I was talking about “verification”, not “validation”. It’s irrelevant how well an integration scheme is proved on paper to have nice order or stability and coded up nicely in matlab. What matters to Stan implementation is how well it’s verified by various tests and field applications, so people have confidence using it and knowing when to use it. If Stan’s BDF integrator spits error, I’m 99.99% sure it’s caused by my model or param settings. Can we say the same when using an in-house solver coded up a year ago?

Very true. Up to community and dev team at this point then to see whether they feel that the testing for field use would be worth it. With newer solvers not appearing in the reputed C++ libraries, at some point, as I said previously, the question will be whether the community decides to stick with what’s there at the cost of losing some users who need expanded differential equation methods, or taking the risk with some of those implementations (the risk can be mitigate somewhat with caveats in the docs and program messages to tell people to default to the more established methods if they want less risk. And if there is desire for the addition of newer solvers, there will be decisions between native implementation or building up the FFI to be able to call differential equation packages. Assessing the community appetite and desire is a longer-term conversation to be had, of course.

An aside – I recall you and @wds15 spoke about wrapping a DDE library some time ago. What is the state of C++ libraries for DDE, SDE, IDE, DAE, and PDE libraries (IDE a type of differential equation that sees less use, so likely less need there)? I don’t really need to branch out to these other types of solvers for my research area and am lucky enough to be able to stay in ODEs, but it looks like they are of interest to other ecologists and biologists.

Well we can only trust CVODES as much as they are tested, and so I think if we can find some set of test ODEs with difficult numerics, we’re a long way to trusting ourselves.

Our interface is also simpler than the full CVODES interface, etc.

1 Like

You mean they don’t control error in the solutions at the supplied time points?

Yeah, they control the local error, so that’s timestep to timestep.

I think you can do significantly less about global error than you can local error.

There’s a section on this in the CVODES manual that talks about how they adjust their timestep/integrator order. It begins:

“A critical part of cvodes — making it an ODE “solver” rather than just an ODE method, is its control of local error. At every step, the local error is estimated and required to satisfy tolerance conditions, and the step is redone with reduced step size whenever that error test fails”

There’s a lot going on.

To see why this is not the case, consider y'=-20y, t_0=0, y_0=1.0. Solution y(t=1.0)=e^{-20}=2\times 10^{-9}. When we solve it in Stan

functions {
  real[] ode(real t, real[] y, real[] theta, real[] x_r, int[] x_i) {
      real r = theta[1];
      real dydt[1];
      dydt[1] =  - r * y[1];
      return dydt;
  }
}
data {
}
transformed data {
  int nt = 1;  
  real t[1] = {1.0};
  real theta[1] = {20.0};
  real t0 = 0.0;
  real y0[1] = {1.0};
  real x_r[0];
  int x_i[0];
}
generated quantities {
  real y_rk45[nt, 1];
  y_rk45 = integrate_ode_rk45(ode, y0, t0, t, theta, x_r, x_i, 1.e-4, 1.e-5, 1000);
}
./rk sample algorithm=fixed_param

gives y_1=2.0\times 10^{-6}. If the relative tolerance is on global error, we should expect

\frac{2.0\times 10^{-6} - 2.0\times 10^{-9}}{2.0\times 10^{-9}} < 10^{-4}

and this is obviously not true.

Yeah, and RK45 works so that at each step

  • the next value is solved using both a fourth-order and fifth-order rule
  • the local error of the fourth order method is estimated as their difference
  • the step is taken if a tolerance is satisfied, otherwise stepsize is reduced with some rule

Note that the step is always taken with the fifth-order rule. In addition, the step size scheme is independent of the desired output time points, and determining the solutions at those points are then an additional source of error.

1 Like

If a handful of people are up for it and nobody finds this effort too disagreeable/unpalatable, I’m up for getting the ball rolling on an experimental native switching solver in Stan once the variadic solver update is settled as a small intermediate step to future larger ODE solver updates.

1 Like

Do you think you could develop the switching algorithm in the Boost ODE framework?

That is our non-stiff solver now, and so if we could just tack on a switching thing to that which had rules for when to jump over to CVODES, that could work pretty slick with everything already there.

For a quick initial response, I’m thinking likely not and that going between libraries might be pretty troublesome. It might be more feasible to write something switching between solvers in Boost? Going to have to look more into this and read up some more.

Right, but we only have to go Boost → CVODES, so it only has to work one way which might make it a simpler proposition.

Good point. Based on the algorithms I’ve read about in the Mathematica documentation page and a discussion with Chris, it seems like once the one-way functionality is there, it’s not too much crazier to add some functionality for switching back (whether the switching back happens satisfactorily is the more difficult and less resolved question, which may ultimately mean the one-way is good enough for now). I’ll revert after I’ve learned more.

So, for stiffness detection, we would have to code something in Boost that estimates the dominant eigenvalue of the ODE Jacobian, and Mathematica writeup said that later LSODA dropped their stiffness detection because their methods of estimating the dominant eigenvalue through norm bounds could be quite inaccurate (some people seem to disagree with the dropping because it was still useful, even if inaccurate). Norm bounds seem more ok for smaller systems though. Not sure what the updates in eigenvalue estimation algs have been. Stiffness detection algs also seem to be different, depending on whether the non-stiff is RK or Adams-Moulton, so we would have to decide what the originating “non-stiff” ODE solver is. I imagine that starting with RK is the way to go.

I think an RK45 -> BDF switch covers more things than an Adams -> BDF switch.