Stan differential equation functionality development priorities

,

Unfortunately I’m not familiar with their work, let alone how it’s implemented.

Got it. I did a little bit more digging and found some slides from a course taught by the maintainer of the DifferentialEquations.jl package that details his algorithm implementation: Week 3 | Introduction to Numerical Methods | Mathematics | MIT OpenCourseWare if it is something you would be curious in skimming later.

I’ve been thinking about this area again after seeing the recent drive for ODE-related improvements in Stan, after seeing @Bob_Carpenter’s recently filed Stan math Github issues, https://github.com/stan-dev/math/issues/1957 and https://github.com/stan-dev/math/issues/1958. I’m wrapping up a project I was working on for one chapter of my PhD thesis, and I was thinking that I could get the ball rolling on development of some pertinent ODE feature that would not only benefit my ensuing projects, but also be a welcome feature for others. I discussed this with @bbbales2, who mentioned that in addition to the adjoint sensitivity analysis being worked on, the addition of more solvers such as an stiff/non-stiff auto-switching algorithm and general improvement of the ODE workflows (e.g. expanded ODE debugging) are two open areas for contribution that could be useful.

The addition of another reliable stiff solver to complement BDF, such as a Rosenbrock method, would be one of my interests. @yizhang, earlier you mentioned that you made headway on this front already with your C interface for FATODE, and I think that includes Rodas4? Rodas4 would be pretty good for some of the smaller stiff systems I’m looking to compare in a subsequent project.

I’d also be up for helping with the addition of a switching solver, but @yizhang has mentioned valuable background info above about the challenges of implementing one. People do seem to be pretty satisfied by the switching solvers Chris Rackauckas built in his DifferentialEquations.jl package (I’m not exactly sure what the recipe is, but I don’t think it’s translated from LSODA). He has mentioned to me that in practice, the algorithms he uses have issues with switching back to non-stiff after exiting stiff regions. Even with limitations, Chris has mentioned though that a switching solver that goes from non-stiff to stiff can still be useful for people less familiar with their systems, and with appropriate caveats, perhaps a one-way non-stiff to stiff switching solver could be useful for people who are uncertain about the state of their smaller dynamical systems. In the field I’m working in, biogeochemistry, there are a few active projects requiring statistical comparisons of recently formulated non-linear models for which the model properties are relatively unknown, and a non-stiff to stiff algorithm could help with some quick and dirty comparisons. In my own project, it took me some time to realize that some of my simulations were crashing because of model instability resulting in divergences and that for some regimes, I needed to use BDF rather than RK45 (this supports @bbbales2’s thought of ODE debugging in Stan and more input to users about algorithm selection). Another major challenge with the notion of a switching algorithm though is that as far as I can tell, the only one in a C/C++/Fortran library is LSODA; of course, natively implementing an algorithm in Stan Math is a challenge of another scale compared to wrapping a recipe from an existing library.

Anyhow, I was interested in updated input from @yizhang, @wds15, @charlesm93, and @bbbales2 about whether they think any of the above would be useful or tractable for contribution (or a nice pie-in-the-sky dream that’s worth iterating toward).

5 Likes

What’s the advantage of Rodas4? Over other things? What sorts of solving does it make easier?

I like this idea if the full switching thing is impractical.

I just went through the ode part of the manual (here) to update it for the variadic solvers and I think it’s time for a rewrite of some sort.

Things I’d like:

  1. The first example to be a model ready to fit. Generated quantities and a discussion of matrix exponentials goes after

  2. Information on how to pick between solvers. If the answer is, “generate simulated data and fit your model with different options and see what is the fastest”, so be it.

There’s a book “Solving Differential Equations in R” Karline Soetaert, Jeff Cash, and Francesca Mazzia that covers in the first 80 pages what ODEs are, what the basic methods are, examples for all the basic methods, and how to pick between them (which seems pretty good imo). It’s attached to the deSolve package in R.

I don’t think it available for free anywhere unfortunately.

@jtimonen was linking me to a page with a bunch of ODE benchmark problems a week or two ago (which is where the book came from) which makes me wonder – does anyone know of/have opinions on any collections of stiff/non-stiff ODE benchmark problems? Right now we’re rolling with the SIR cmdstan benchmark example and whatever @wds15 comes up with on the spot.

Hi,

I haven’t read this chain entirely but just gonna comment on this benchmark problem collection thing. The resource I once linked to @bbbales2 was this.

But in those problems the parameters are fixed, and there is no data or model, just the ODE. If one wants to turn them into benchmark problems for Bayesian inference, one would need to decide that some of the parameters are unknown and give them some priors. And then have some data and observation model, too. There exist some benchmark problems like this for parameter estimation in ODE models, but they usually consider just optimising using least squares or something. The parameter estimation section of DiffEqBenchmarks.jl considers also Bayesian inference.

Those are just what I remembered or found with quick search, but If someone knows good benchmark problems or problem sets for Bayesian parameter inference in ODE models, I am interested.

  1. How hard would it be to implement the one way non-stiff->stiff switching integrator?

  2. Is it possible to make that switch differentiable? Dunno how much this matters, but everything that is a function of parameters in the model should be differentiable (and the switch is a function of where in parameter space we are).

I was thinking about a very brute force / simple minded switching thing:

  • run RK45 with a relatively low max_num_steps
  • in case it throws the max_num_steps thing, then just rerun with bdf

it’s horribly brute-force and potentially super inefficient if the “switch” happens late, but maybe that’s already good enough for problems where warmup/sampling is stiff / non-stiff as implied by extreme parameter draws during warmup.

Maybe this is not the best solution, but easy to try.

I should’ve cited sources. Those weren’t written by me. I just copied over the Wiki pages when I was cleaning up the wiiki.

What are you working on specifically? I wrote a case study ages ago on soil-carbon compartment models, but I’ve never gotten around to extending it to general design matrices a la the soilR package, or to building better biomass models along the line of Allison et al. I’d still like to get back into fitting those models.

Was Rodas written in Julia? I couldn’t tell from glancing at the package doc.

As far as I can tell, every applied math professor on the planet have produced a set of notes like this for their intro ODE solver classes. They usually start with the Lotka-Volterra example, which was what made it so easy for me to take it and use it as a case study for Stan.

Random initialization out in the tail is a big problem for us. It can lead to stiffness in early iterations that would never show up if we started near the eventual solution.

We only need derivatives of solutions w.r.t. parameters. We don’t necessarily need the algorithm itself to be differentiable. How we solve the diff eq shouldn’t matter if we’re not differentiating the algorithm.

1 Like

With nothing there the logic would reduce to:

if(param > thresh) {
  method1(param);
} else {
  method2(param);
}

If method1 and method2 are only accurate to some tolerance, then there’ll probably be a discontinuity. Does it matter is part of the question.

That won’t need any special treatment assuming method1 and method2 are computing the same function with different approaches. They might not have the same tolerances, and one or both might throw some computational exception, but otherwise they’ll be computing the same function and derivatives.

It’s the same thing we do when branching on a parameter in the log functions to allow Taylor series to be employed when we’d otherwise get rounding.

Yeah exactly. I’m not trying to be overly pedantic here. I genuinely don’t know if it matters or not and I suspect there might be an easy bit of duct-tape.

My thesis involves establishing a framework for more robust biogeochemical model comparisons; coincidentally, my project involved comparing the models depicted in Allison et al., 2010 (Steve Allison is my advisor). I’d like to thank you for that case study (and definitely noted you in the Acknowledgments section of the manuscript), as it was a foundational help for the manuscript I submitted to Biogeosciences recently. That project very much was a supplement to your case study. The pre-print is available here: https://www.biogeosciences-discuss.net/bg-2020-23/. I’m a few drafts past that now, as I had to fix some things including proper implementation of LOO/WAIC, but the pre-print as is describes the fitting process I used and intended future directions. At some point in the next couple of months, I’d love to discuss some more about what you’d like to see done in this area. Carlos Sierra and Steve Allison are both into the idea of expanding the functionality of soilR so that it can function more as a model comparison testbed (or perhaps creating a new R package from scratch entirely). My immediate next projects involve demonstrating benchmarking of model predictive accuracy using multivariate soil respiration and microbial biomass data from the Harvard Forest soil warming and TRACE soil warming studies.

Rodas4 is written in Julia as part of DifferentialEquations.jl suite, but is an established member of the Rosenbrock family and I think is available under a different name as part of the FATODE library (@yizhang correct me if I’m wrong). Chris wants to get DifferentialEquations.jl into Stan, but that’s not possible at this point because of the state of foreign function interfaces in Stan. There was a curious comment in HackerNews about how one would go about getting DifferentialEquations.jl in Stan: In order to connect Stan with DifferentialEquations.jl the steps would be: 1. Cr... | Hacker News Otherwise one could replcate some solvers in Stan math, which would take less time, because FFIs are a separate beast.

Rodas4 is a 4th order Rosenbrock method. Chris gives an overview of pros and cons of Rosenbrock and BDF methods here, and it appears that Rosenbrock is superior to BDF for smaller stiff systems. BDF is superior for much larger systems, but I would guess that larger ODE systems with 100+ variables often are explored less often by Stan users, so Rosenbrock methods would be superior choices for most ODE system comparisons. Of course, I have to learn more of the literature here and am going off of summaries of Hairer’s Solving Ordinary Differential Equations II and Chris Rackauckas’ overviews for that.

2 Likes

Yeah, the late switch would worry me, but definitely an option to try.

I’m not familiar with stiffness detection, but it looks like Wolfram Reference has a page on this: Stiffness Detection—Wolfram Language Documentation. It seems like there’s now better ways for dominant Jacobian eigenvalue estimation there were not available at the time of CVODE’s creation.

As an aside, I’m reading about these “rejection steps” for stiffness detection as an algorithm steps to an ODE solution that reminds me of Markov chain algorithms. If somebody’s not already working on that (they probably are), there’s probably some line of research that’s definitely not for me to tackle at this point involving applying Markov chain algorithms to ODE solutions.

It does not matter.

Consider this program:

real y = x > 1e-10 ? log(1 - x) : log1p(-x);

or if you prefer

real y;
if (x > 1e-10)
  y = log(1 - b)
else
  y = log1p(-b)

Just two different ways of implementing the same function. Up to arithmetic precision, there’s no discontinuity at 1e-10.

I guess that answers the question of whether he’s still working on these models. Tell him I said “hi”! I met him at a soil carbon workshop at Biosphere II a few years ago. I roomed with Carlos Sierra. It was one of the most fun workshops I’ve ever been to. I also met Kiona Ogle, who’s doing some really nice Bayesian modeling, and Kathe Todd-Brown, who also wanted to build more biologically sophisticated ODE models.

If you want to talk about evaluating these kinds of models, I’m happy to hop on a chat and talk about it. I’d love to hear how this work is going.

1 Like

Will definitely pass on the hello! I think Steve would love to talk more about this, as he wants to bring a lot more statistical rigor to the biogeochemistry community. I’ll reach out to you separately about these projects over the next couple of months and will also pass over the manuscript when it’s hopefully accepted. We’ll definitely need some help to point us the right way, as the non-linear models are a bit of an issue (since the models need to be initiated at steady state to isolate the perturbation response from the transient activity of a model heading to steady state). The non-linear models without analytical steady states otherwise have to be “spun up” to steady state or root solving algorithms need to be used. The original manuscript actually was supposed to compare some other models, but AWB from Allison et al. was one of the few non-linear models with an analytical steady state expression for each of its state variables. Without root solving algorithms in Stan, it looks like spin-up is the only option for some of these non-linear models, and then confirmation of steady state and per-MC iteration speed become big issues. This led to Steve asking me to investigate ODE solving in Stan to see whether the speed situation could be helped by adding solvers or a switching algorithm. Anyhow, will chat about this later!

That’d be great (for me because I love these problems and for the field because biogeochem really needed better stats when I looked in a few years ago).

I just moved from Columbia Uni to the Flatiron Institute’s Center for Computational Mathematics, where my colleagues are mostly experts in diff eq solvers. I’ll still be working on Stan, but I’m hoping to somehow put my new group and former group together with an eye toward better diff eq solving.

I’m not sure what “spun up” means. Is it for initialization?

I was somehow under the impression we could use the matrix exponential to calculate steady-state solutions. Or does that only work for linear systems? I know @charlesm93 was investigating the stability of the matrix exponential for this kind of thing.

1 Like

Rosenbrock needs to update Jacobian at every step, while covdes’ BDF doesn’t have to. Using autodiff for Jacobian at every step may induce penalty on this claim.

The biggest concern for adding an ODE integrator would be its verification and future maintenance. Most time I’d put money on something with years’ track record(LSODA, CVODES) rather than new implementation.

2 Likes