Costs/benefits of solving large ODE systems as systems of difference equations/in discrete time

I have been working with large systems of ordinary differential equations in a systems modelling context, similar to the ‘system dynamics’ literature of Jay Forrester, John Sterman et al. System dynamics models often include a large number of ODEs (20+ is not uncommon) and parameters, and data for some of the state variables may never be available/have a high proportion of missingness. For example, see the C-ROADS model (

Solving large systems of ODEs in Stan obviously takes a long time (up to days/weeks in certain cases with a lot of data), although I have managed to increase their speed by ensuring all parameters are on a similar scale and amending the tolerances etc.

I was reading this paper comparing dynamic Bayesian Markov models to ‘gold standard’ Bayesian ODEs (, where the authors essentially converted the system of ODEs they were working with to difference equations, and fit the models using WinBugs (for the ODE model) and JAGS. They found that the discretised dynamic Markov model gave comparable estimates to the Bayesian ODE, with a massive speed increase.

I changed some of my Stan programs to difference equations rather than using integrate_ode_rk45, and also saw a massive speed increase - one model on simulated data went from running for ~2 days to under 20 minutes, with no signs of convergence problems (no diveregent transitions, good n_eff) or any real decrease in accuracy of the results.

As a small example, I have attached a logistic model coded as an ODE system and as a difference equation/discrete-time system (DTS), showing that the estimates are reasonably comparable.

The ODE function looks like:

// logisitc ode model
  real[] m(real t, real[] z, real[] theta, real[] x_r, int[] x_i){
    real dydt = z[1] * theta[1] * (1 - z[1]/theta[2]);
    return{ dydt };

and the DTS system looks like:

// logisitc dts model
  real[] m_dts(int T, real[] z, real[] theta){
    real n[T];
    n[1] = z[1];
    for(t in 1:(T-1)){
      n[t+1] = n[t] + n[t] * theta[1] * (1 - n[t]/theta[2]);
    return n[];

Here is the ODE output, where the true parameters were y_init = 1, sigma = 0.1, theta[1] = 0.2 and theta[2] = 200:

            mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
y_init[1]   0.95    0.00 0.04   0.86   0.92   0.95   0.98   1.03  2042    1
sigma       0.11    0.00 0.01   0.10   0.11   0.11   0.12   0.13  2807    1
theta[1]    0.20    0.00 0.00   0.20   0.20   0.20   0.21   0.21  1966    1
theta[2]  196.98    0.05 2.74 191.52 195.08 197.02 198.85 202.32  3069    1

and here is the DTS output:

            mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
y_init[1]   1.19    0.00 0.05   1.10   1.16   1.19   1.23   1.30  2035    1
sigma       0.11    0.00 0.01   0.10   0.11   0.11   0.12   0.13  3375    1
theta[1]    0.22    0.00 0.00   0.21   0.22   0.22   0.22   0.23  1695    1
theta[2]  195.61    0.05 2.62 190.47 193.88 195.56 197.40 200.59  2620    1

And here are the model predictions (thin black lines - predictions are without added noise) plotted against the simulated noisy data (red dots) and actual simulated model (thick blue line):

There are some drops in accuracy, even in this small model, but I have not found them to be particularly worrying, even in larger models where differences between continuous and discrete time modelling may be more evident (i.e. models where ordering of events has a potentially large effect on the outputs).

I’d be really interested to hear people’s opinions about the costs and benefits of this approach over using the ODE solvers in Stan. Obviously, using the ODE solvers is the gold standard, but when there is a potentially 10-20 times speed up in model runs for a relatively small drop in accuracy, my thinking is that it could be a useful tool.

I will try to look into more complex models in a more systematic fashion too.

r-logistic-example.R (1.7 KB) logistic-ode.stan (805 Bytes) logistic-dts.stan (760 Bytes)




Neat. If I read this right, then you are doing an Euler forward integration here. This needs to be used with caution given that this integration scheme is unstable and inaccurate for difficult functions.

I have myself used a similar trick in a different setting. In the example I ran across I was integrating a turn-over ODE model which was driven by another input function. The turn-over ODE cannot be solved analytically for general input functions… but whenever the input was constant it is possible to solve the turn-over ODE. What I ended up doing is to integrate the turn-over ODE in smallish steps. So I did replace the smooth input function by a step function. During each step the input function thus became constant. This trick brought down the Stan runtime from 3 days to just 8h !

The best thing about these tricks is that you can usually approach the gold standard in order to check the validity of the approximation. In my case I did half the step-size to confirm to get the same result. In your case you can also make the step-size smaller to check if things work… and if still in doubt you can run the ODE integrator with a scaled down problem size.

In short, I am a huge fan of getting rid of ODE solvers with some approximations and I will look into the paper you cited. These tricks are extremely valuable as the overhead from ODE solvers is enormous.



Yes, this is a Euler integration with dt = 1. I have decreased the step size and get more accurate theta results in line with integrate_ode_rk45, although the inital value is still not a good estimate (more influenced by the first data point).

That’s interesting! I think having a parallel version of the ODE integrators would be helpful, which I appreciate you are working on/have some toy examples of. I still haven’t gotten my head around whether it is possible to parallelize all ODE systems though.

Solving these larger models has got me thinking about the differences between continuous and discrete time models in general, so I am working on building some of the models I work with as discrete time systems, even agent based models (ABMs), and running these in Stan. One issue is that a lot of discrete/ABMs include randomness in their model runs, though, and Stan obviously doesn’t allow the rng functions in the functions block.

If you have a hierarchical model where you need to solve the same ODE for, say, many independent patients - then you can use map_rect parallelism. I did that here

And can speed up a data set with more than 1000 patients to run instead of in 3 days in just 1h when using 80 cores per chain. So you see - brute force is a way forward. The analytical trick (which is on that poster) gives you 3 days => 8h on a single core. This approximate model can benefit from parallezation as well - here we go from 8h on 1 core down to 45 minutes with merely 16cores.


Yeah this happens. First time I saw this was doing forward Euler on a diffusion model working better than the solvers, which doesn’t make any sense (though my memory is cloudy – I think diffusion is a standard example of why you want backward Euler over forward Euler). We gotta get around to investigating this.

Thanks for sharing the model. That’s super helpful to have!

The easy way to check stuff like this is just put one of the Stan ODE solves with high precision in generated quantities and compare the solution that returns with the approximation you made.

Just look at something like the maximum absolute and relative error at every timepoint and state of the ODEs. Shouldn’t be too expensive to compute and you can just look at the distribution of errors you’re making and decide if you like the approximation or not.

1 Like

This thread makes me idly wonder whether a simpler solver + uncertainty (some form of sde) might be handy for this stuff. Could be that the computational gains from the simpler solver are then lost by incorporating uncertainty though.

1 Like

I’ve also used discretized solvers in my StanCon 2018 submission. In my case, I’ve found the accuracy of Euler to be OK, but systematically biased compared to RK45. However, the Trapezoidal rule achieved (in my case) results literally indistinguishable from RK45 for just a tiny increase in computation cost and code complexity (compared to Euler).


Would it make sense to have more elementary ODE solvers, on top of the three solvers we currently have (rk45, bdf, and adams)?

Also, am I correct in assuming none of these approximations work for stiff ODEs?

It bears investigating, for sure. And probably a look into the complex solvers too.

The diffusion problem fits this I think, which is why it bothered me.

For scalable ODE solvers we should really look into

which does a lot of cool stuff (adjoint solving, sparse, ODE events, ODE root finding)

(and is written in C++ with an adequate license)