What's the usage of ode_adams?

@bbbales2 & @wds15 have lamented for quite a while the lack of application of ode_adams in their modeling & development practice. This is partially my fault in being terse in documentation and later procrastination, and partially due to the bias of models used in Stan’s testing & benchmark, which is really a bias of fields that Stan has touched so far. I’ve started amend the issue in documentation, and here would like to give an example in which ode_adams triumphs.

So far Stan’s ODE development routinely involves only a handful of examples: predator-prey model by @Bob_Carpenter , epiomology model as found in posteriordb, a PKPD model,, a textbook oscillation model, and Lorenz attractor model. The model below is not really more sophisticated, in fact, one can argue it is simpler for being linear:

y''(t) + ty(t) = 0, \, y(0)=y_0, \, y'(0) = 0.

This is Airy’s equation, and in code it’s just

functions {
  vector airy(real t, vector y) {
  return [ y[2], -t * y[1] ]';
}
}

Assuming we want to infer y_0 given y observed at t=90,91,\dots,100:

data {
 int n;  // 11 data points from t=90 to 100
 real ts[n];
 vector[n] yobs;
 real rtol; // 1e-10
 real atol; // 1e-10
}

and also want to ensure the accuracy at these points is at scale of 1e-6. Consider the equation has frequent applied in fields optics and electromagnetics, this is not too much to ask for instrumentation. Since it has analytical solution, we can backtrace to determine that tolerance required should be at least 1e-10 in ODE solvers. This happens to be the default of BDF & Adams solvers.

parameters {
 real<lower=0> y0;
 real<lower=0> sigma;
}

transformed parameters {
 vector[2] yinit = [y0, 0.0]';
 vector[2] y[n] = ode_adams_tol(airy, yinit, 0.0, ts, rtol, atol, 1000000); // or bdf or rk45
}

model {
  y0 ~ cauchy(0, 0.5);
  sigma ~ normal(0, 0.01);
  for (i in 1:n) {
  yobs[i] ~ normal(y[i][1], sigma);
}
}

Simulated data based on y0=1.0 sigma=0.01:

n <- 11
ts <- c(90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100)
yobs <- c(
 -0.27204644,  0.26198631, -0.25835151,  0.22486253, -0.16963251,  0.09098462,
  0.02549337, -0.14729033,  0.23571404, -0.27632096,  0.25950532)
rtol <- 1.e-10
atol <- 1.e-10

Fitting from init y0=1.1; sigma=0.1, regardless integrator used, we get something like

However, the time performance among the three integrators are very different:

with adams takes less than half of the time than bdf. To see why, we print out the order used in each Adams time step(keep in mind BDF & Adams are variant-order, with BDF order 1-5 and Adams order 1-12), in a fixed_param run:

> fit <- mod$sample(data = "airy.data.R", init="airy.init.R", iter_sampling=1, fixed_param=TRUE, seed=987)
Running MCMC with 1 chain...

Chain 1 Last order: 9 
Chain 1 Last order: 9 
Chain 1 Last order: 9 
Chain 1 Last order: 9 
Chain 1 Last order: 9 
Chain 1 Last order: 9 
Chain 1 Last order: 9 
Chain 1 Last order: 9 
Chain 1 Last order: 9 
Chain 1 Last order: 9 
Chain 1 Last order: 9 
Chain 1 Last order: 7 
Chain 1 Last order: 7 
Chain 1 Last order: 7 
Chain 1 Last order: 7 
Chain 1 Last order: 7 
Chain 1 Last order: 7 
Chain 1 Last order: 7 
Chain 1 Last order: 7 
Chain 1 Last order: 7 
Chain 1 Last order: 7 
Chain 1 Last order: 7 

This shows Adams is constantly using much higher orders than rk45 and the max order of BDF, so it can take less steps to reach a same accuracy. It is this capability of bumping up to high orders in smooth non-stiff problems that makes Adams suitable to problems that requires higher accuracy, such as in orbital dynamics and oceanography. Some simulations in these fields require a long-duration time(where is Mercury in year 2050?) and one must reduce tolerance to avoid accumulation of local errors.

The model & data in this post can be found in Torsten’s example models.

4 Likes

Thanks for the example. Would it be fair to summarise that Adams is good for problems where the function is very smooth and we need to get the solution for a long duration with high precision?

You write

Some simulations in these fields require a long-duration time(where is Mercury in year 2050?) and one must reduce tolerance to avoid accumulation of local errors.

and this is exactly what we saw in the past: When running Adams with tolerances of 1E-6 the solver started to return useless results which were just wrong. Only when running Adams at high tolerances of 1E-10 or so I got useful results from the fits. This is what lead us (@bbbales2 and myself) to switch how errors are being checked when sensitivities are used which lead to a more conservative behaviour in step sizes (that is we include now the sensitivities in the error metric rather than leaving these out).

How would the above example fare if we reduce the target tolerance to 1E-6 during fitting (but simulate with 1E-10) ? Is RK45 still much slower?

Oh okay so this is the selling point for Adams? It is effectively a much higher order method, and so if a solution is smooth and we might expect a high order polynomial to interpolate it well, then we might also expect Adams to work well (because it can take huge steps with much smaller error).

So in this example the solution is very smooth, and so Adams, even though it is an implicit solver, is able to take much bigger steps than RK45 and get the solution faster.

High precision is the key here. Long duration is just one of the reasons that one may want to consider high precision. After all “long” is relative.

Not sure I understand the "target tolerance to 1E-6 during fitting (but simulate with 1E-10) " part but I expect rk45 is faster if we lower atol & atol to 1e-6. I’ve been busy so a quick check is to see the order of Adams with this lower tol:

> sim  <- mod$sample(data="airy.data.R",init="airy.init.R",fixed_param=1,iter_sampling=1)
Running MCMC with 1 chain...

Chain 1 last order: 5 
Chain 1 last order: 5 
Chain 1 last order: 5 
Chain 1 last order: 5 
Chain 1 last order: 5 
Chain 1 last order: 5 
Chain 1 last order: 5 
Chain 1 last order: 5 
Chain 1 last order: 5 
Chain 1 last order: 6 
Chain 1 last order: 6 
Chain 1 last order: 5 
Chain 1 last order: 5 
Chain 1 last order: 5 
Chain 1 last order: 5 
Chain 1 last order: 5 
Chain 1 last order: 5 
Chain 1 last order: 5 
Chain 1 last order: 5 
Chain 1 last order: 5 
Chain 1 last order: 5 
Chain 1 last order: 5 

As you can see, Adams is using approximately same order of rk45. Since it has to pay for nonlinear solver, I expect it’ll be slower in this tolerance.

Great. I get it now.

We really need an overview page in our doc stating which application cases each solver is intended to be used for. To me Adams was until now entirely useless to the point that I thought we should remove it… but there are applications obviously for it where it shines. That needs a mention in our doc.