Adams Moulton ODE solver in Stan useful?

Hi!

@bbbales2 and myself have found that the Adams Moulton solver in stan is very bad in terms of speed and accuracy in the problems we tried, see here https://github.com/stan-dev/math/pull/1641#issuecomment-642513156 for details.

The example I tried is non-stiff which is supported by the observation that the RK45 solver handles the ODE problem (2-cmt oral dosing, 3 states, 5 parameters) well; so the RK45 integrator works a lot faster than the respective stiff BDF solver from CVODES. However, the Adams-Moulton solver just stalls and is terrible in handling this problem - no matter how I set the tuning parameters.

So I am wondering if there are users who have used Adams-Moulton successfully in their problems and what these problems were. My concern is that there are bigger issues with the Adams-Moulton and we need to revise this solver to make it useful, which it seems it is not right now.

Maybe @yizhang has an idea or some suggestion here (given this is used to my knowledge in Torsten)? Pinging @charlesm93 as well.

Sebastian

2 Likes
#include <stan/math/rev.hpp>
#include <gtest/gtest.h>
#include <stan/math/rev/functor/gradient.hpp>
#include <iostream>
#include <sstream>
#include <vector>
#include <stdexcept>
#include <chrono> 
using namespace std::chrono; 

class b5 {
 public:
  template <typename T0, typename T1, typename T2>
  inline std::vector<stan::return_type_t<T1, T2>>
  operator()(const T0& t_in, const std::vector<T1>& y,
             const std::vector<T2>& theta, const std::vector<double>& x,
             const std::vector<int>& x_int, std::ostream* msgs) const {
    if (y.size() != 6)
      throw std::domain_error("Functor called with inconsistent state");

    std::vector<stan::return_type_t<T1, T2>> dydt(6);
    const T2& a = theta[0];
    dydt[0] = -10.0 * y[0] + a * y[1];
    dydt[1] = -a * y[0] - 10.0 * y[1];
    dydt[2] = -4.0 * y[2];
    dydt[3] = -1.0 * y[3];
    dydt[4] = -0.5 * y[4];
    dydt[5] = -0.1 * y[5];
    return dydt;
  }
};

TEST(demo, rk45) {
  std::vector<stan::math::var> y0{1, 1, 1, 1, 1, 1};
  double t0 = 0.0;
  std::vector<double> ts{1.0, 100.0};
  std::vector<double> theta{100.0};
  std::vector<double> x_r;
  std::vector<int> x_i;

  auto start = high_resolution_clock::now(); 
  std::vector<std::vector<stan::math::var>> ys = stan::math::integrate_ode_rk45(b5(), y0, t0, ts, theta, x_r, x_i, 0, 1.e-14, 1.e-20, 100000);
  auto stop = high_resolution_clock::now(); 
  auto duration = duration_cast<microseconds>(stop - start);
  std::cout << "rk45 timed: " << duration.count() << "\n";
}

TEST(demo, bdf) {
  std::vector<stan::math::var> y0{1, 1, 1, 1, 1, 1};
  double t0 = 0.0;
  std::vector<double> ts{1.0, 100.0};
  std::vector<double> theta{100.0};
  std::vector<double> x_r;
  std::vector<int> x_i;

  auto start = high_resolution_clock::now(); 
  std::vector<std::vector<stan::math::var>> ys = stan::math::integrate_ode_bdf(b5(), y0, t0, ts, theta, x_r, x_i, 0, 1.e-14, 1.e-20, 100000);
  auto stop = high_resolution_clock::now(); 
  auto duration = duration_cast<microseconds>(stop - start);
  std::cout << "bdf timed: " << duration.count() << "\n";
}

TEST(demo, adams) {
  std::vector<stan::math::var> y0{1, 1, 1, 1, 1, 1};
  double t0 = 0.0;
  std::vector<double> ts{1.0, 100.0};
  std::vector<double> theta{100.0};
  std::vector<double> x_r;
  std::vector<int> x_i;

  auto start = high_resolution_clock::now(); 
  std::vector<std::vector<stan::math::var>> ys = stan::math::integrate_ode_adams(b5(), y0, t0, ts, theta, x_r, x_i, 0, 1.e-14, 1.e-20, 100000);  
  auto stop = high_resolution_clock::now(); 
  auto duration = duration_cast<microseconds>(stop - start);
  std::cout << "adams timed: " << duration.count() << "\n";
}

Timed on stan 2.23:

mpirun -np 1 test/unit/math/rev/integrate_ode_adams_perf_test --gtest_output="xml:test/unit/math/rev/integrate_ode_adams_perf_test.xml"
Running main() from lib/gtest_1.8.1/src/gtest_main.cc
[==========] Running 3 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 3 tests from demo
[ RUN      ] demo.rk45
rk45 timed: 191379
[       OK ] demo.rk45 (192 ms)
[ RUN      ] demo.bdf
bdf timed: 229499
[       OK ] demo.bdf (229 ms)
[ RUN      ] demo.adams
adams timed: 74051
[       OK ] demo.adams (74 ms)
[----------] 3 tests from demo (495 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test case ran. (495 ms total)
[  PASSED  ] 3 tests.
2 Likes

Thanks! This should be good enough to do the diagnosing.

Thanks. The tolerances in these examples is very low. Is this a general thing to recommend with the Adams Moulten? I wonder if one really gets this high accuracy. Hence, one should probably compare these problems vs an analytic solution.

Is there experience with Adams Moulten in Torsten?

There’s no “these”, it’s just one example, so I wouldn’t generalize, in particular

Hos “low” is “low tolerance”, and how “high” is “high accuracy”? Talking about an integrator’s tolerance & accuracy without an actual ODE or integration interval is as meaningful as asking “is 3 a lot?”.

I don’t understand this statement.

I don’t understand the question. I have used both Stan’s & Torsten’s AM integrators. They’re just a same method with two implementations.

Whow…that was very constructive…so let’s park this simply.

Just tried this problem out in the new branch:

rk45 timed: 179593
bdf timed: 203776
adams timed: 64780

And with the new interfaces:

rk45 new timed: 161732
bdf new timed: 197372
adams new timed: 62082

So I tried the adams stuff on a version of the Lotka-Volterra case study and it was much slower than rk45.

Is it obvious why that would be the case (can post code if that seems wrong)?

For a same attainable order of accuracy and same time step, AM is more expensive because it’s implicit.

Well then why is RK45 slower in the problem above?

And docs say the Adams-Moulton method will vary between 1st and 12th order.

Because to attain the solution near t_0 and t_{\text{final}} rk45 needs to reduce time step aggressively, while AM doesn’t have to because it can enlist higher-order to reduce truncation error.

Here’s another example, related to the orbital dynamics that @s.maskell mentioned here. In the ODE system(2body), I’m using the default adams/bdf tolerance, and Adams is faster than the other two. Then I changed the ODE system to a trivial linear one defined by identity matrix, and use a “very low” tolerance, and this time adams is still faster. The example is intended to demonstrate that without involving the nature of ODE system, talking about “high” & “low” tolerance is meaningless, let alone if an integrator “is useful”.

class two_body {
 public:
  template <typename T0, typename T1, typename T2>
  inline std::vector<stan::return_type_t<T1, T2>>
  operator()(const T0& t_in, const std::vector<T1>& y,
             const std::vector<T2>& theta, const std::vector<double>& x,
             const std::vector<int>& x_int, std::ostream* msgs) const {
    if (y.size() != 4)
      throw std::domain_error("Functor called with inconsistent state");

    T1 r = sqrt(y[0]*y[0] + y[2]*y[2]);
    std::vector<stan::return_type_t<T1, T2>> dydt(4);
    dydt[0] = y[1];
    dydt[1] = -y[0]/pow(r, 3.0);
    dydt[2] = y[3];
    dydt[3] = -y[2]/pow(r, 3.0);
    return dydt;
  }
};

TEST(demo, two_body) {
  double t0 = 0.0;
  std::vector<double> ts{1.0, 100.0};
  std::vector<double> theta;
  std::vector<double> x_r;
  std::vector<int> x_i;

  {
    std::vector<stan::math::var> y0{2, 0, 0, 0.5};
    auto start = high_resolution_clock::now(); 
    std::vector<std::vector<stan::math::var>> ys = stan::math::integrate_ode_rk45(two_body(), y0, t0, ts, theta, x_r, x_i, 0, 1.e-10, 1.e-10, 100000);
    auto stop = high_resolution_clock::now(); 
    auto duration = duration_cast<microseconds>(stop - start);
    std::cout << "rk45 timed: " << duration.count() << "\n";
  }

  {
    std::vector<stan::math::var> y0{2, 0, 0, 0.5};
    auto start = high_resolution_clock::now(); 
    std::vector<std::vector<stan::math::var>> ys = stan::math::integrate_ode_bdf(two_body(), y0, t0, ts, theta, x_r, x_i, 0, 1.e-10, 1.e-10, 100000);
    auto stop = high_resolution_clock::now(); 
    auto duration = duration_cast<microseconds>(stop - start);
    std::cout << "bdf timed: " << duration.count() << "\n";
  }

  {
    std::vector<stan::math::var> y0{2, 0, 0, 0.5};
    auto start = high_resolution_clock::now(); 
    std::vector<std::vector<stan::math::var>> ys = stan::math::integrate_ode_adams(two_body(), y0, t0, ts, theta, x_r, x_i, 0, 1.e-10, 1.e-10, 100000);
    auto stop = high_resolution_clock::now(); 
    auto duration = duration_cast<microseconds>(stop - start);
    std::cout << "adams timed: " << duration.count() << "\n";
  }
}

class degen {
 public:
  template <typename T0, typename T1, typename T2>
  inline std::vector<stan::return_type_t<T1, T2>>
  operator()(const T0& t_in, const std::vector<T1>& y,
             const std::vector<T2>& theta, const std::vector<double>& x,
             const std::vector<int>& x_int, std::ostream* msgs) const {
    if (y.size() != 4)
      throw std::domain_error("Functor called with inconsistent state");

    T1 r = sqrt(y[0]*y[0] + y[2]*y[2]);
    std::vector<stan::return_type_t<T1, T2>> dydt(4);
    dydt[0] = -y[0];
    dydt[1] = -y[1];
    dydt[2] = -y[2];
    dydt[3] = -y[3];
    return dydt;
  }
};

TEST(demo, rk45_degen) {
  double t0 = 0.0;
  std::vector<double> ts{1.0, 100.0};
  std::vector<double> theta;
  std::vector<double> x_r;
  std::vector<int> x_i;

  {
    std::vector<stan::math::var> y0{2, 0, 0, 0.5};
    auto start = high_resolution_clock::now(); 
    std::vector<std::vector<stan::math::var>> ys = stan::math::integrate_ode_rk45(degen(), y0, t0, ts, theta, x_r, x_i, 0, 1.e-14, 1.e-16, 100000);
    auto stop = high_resolution_clock::now(); 
    auto duration = duration_cast<microseconds>(stop - start);
    std::cout << "rk45 timed: " << duration.count() << "\n";
  }

  {
    std::vector<stan::math::var> y0{2, 0, 0, 0.5};
    auto start = high_resolution_clock::now(); 
    std::vector<std::vector<stan::math::var>> ys = stan::math::integrate_ode_bdf(degen(), y0, t0, ts, theta, x_r, x_i, 0, 1.e-14, 1.e-16, 100000);
    auto stop = high_resolution_clock::now(); 
    auto duration = duration_cast<microseconds>(stop - start);
    std::cout << "bdf timed: " << duration.count() << "\n";
  }

  {
    std::vector<stan::math::var> y0{2, 0, 0, 0.5};
    auto start = high_resolution_clock::now(); 
    std::vector<std::vector<stan::math::var>> ys = stan::math::integrate_ode_adams(degen(), y0, t0, ts, theta, x_r, x_i, 0, 1.e-14, 1.e-16, 100000);
    auto stop = high_resolution_clock::now(); 
    auto duration = duration_cast<microseconds>(stop - start);
    std::cout << "adams timed: " << duration.count() << "\n";
  }
}

output

mpirun -np 1 test/unit/math/rev/integrate_ode_adams_2_perf_test --gtest_output="xml:test/unit/math/rev/integrate_ode_adams_2_perf_test.xml"
Running main() from lib/gtest_1.8.1/src/gtest_main.cc
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from demo
[ RUN      ] demo.two_body
rk45 timed: 12119
bdf timed: 16825
adams timed: 10083
[       OK ] demo.two_body (39 ms)
[ RUN      ] demo.rk45_degen
rk45 timed: 5143
bdf timed: 5859
adams timed: 2391
[       OK ] demo.rk45_degen (13 ms)
[----------] 2 tests from demo (52 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (53 ms total)
[  PASSED  ] 2 tests.

In this last model, let’s look at something that @charlesm93 had worked on.

struct TwoCptNeutModelODE {
  template <typename T0, typename T1, typename T2>
  inline std::vector<typename stan::return_type<T1, T2>::type>
  operator()(const T0& t_in,
             const std::vector<T1>& x,
             const std::vector<T2>& theta,
             const std::vector<double>& x_r,
             const std::vector<int>& x_i,
             std::ostream* msgs) const {
    using scalar_type = typename stan::return_type<T1, T2>::type;

    const T2& CL    = theta[0];
    const T2& Q     = theta[1];
    const T2& V1    = theta[2];
    const T2& V2    = theta[3];
    const T2& ka    = theta[4];
    const T2& mtt   = theta[5];
    const T2& circ0 = theta[6];
    const T2& gamma = theta[7];
    const T2& alpha = theta[8];

    T2 k10 = CL / V1;
    T2 k12 = Q / V1;
    T2 k21 = Q / V2;
    T2 ktr = 4 / mtt;
    
    std::vector<scalar_type> dxdt(8);
    dxdt[0] = -ka * x[0];
    dxdt[1] = ka * x[0] - (k10 + k12) * x[1] + k21 * x[2];
    dxdt[2] = k12 * x[1] - k21 * x[2];    
    scalar_type conc = x[1]/V1;
    scalar_type EDrug = alpha * conc;

    // x[4], x[5], x[6], x[7] and x[8] are differences from circ0.
    scalar_type prol = x[3] + circ0;
    scalar_type transit1 = x[4] + circ0;
    scalar_type transit2 = x[5] + circ0;
    scalar_type transit3 = x[6] + circ0;
    scalar_type circ = stan::math::fmax(stan::math::machine_precision(), x[7] + circ0);

    dxdt[3] = ktr * prol * ((1 - EDrug) * (pow(circ0 / circ, gamma) - 1));
    dxdt[4] = ktr * (prol - transit1);
    dxdt[5] = ktr * (transit1 - transit2);
    dxdt[6] = ktr * (transit2 - transit3);
    dxdt[7] = ktr * (transit3 - circ);

    return dxdt;
  }
};

with

    ts        {0.083, 0.167, 0.25, 0.5, 0.75, 1, 1.5, 2,3,4,6,8,12,24,36,48,60,72,1000},
    theta     {10, 15, 35, 105, 2, 10, 5, 0.17, 2.0e-4},
    y0        {100.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0}

and this time I’m using a tolerance even lower than the default.

TEST_F(TorstenOdeTest_neutropenia, rk45_demo) {
  vector<var> theta_var = stan::math::to_var(theta);
  auto start = high_resolution_clock::now(); 
  vector<vector<var> > y = integrate_ode_rk45(f, y0, t0, ts, theta_var, x_r, x_i,0, 1.e-7, 1.e-8, 1000000);
  auto stop = high_resolution_clock::now(); 
  auto duration = duration_cast<microseconds>(stop - start);
  std::cout << "rk45 timed: " << duration.count() << "\n";
}

TEST_F(TorstenOdeTest_neutropenia, BDF_demo) {
    vector<var> theta_var = stan::math::to_var(theta);
    auto start = high_resolution_clock::now(); 
    vector<vector<var> > y = integrate_ode_bdf(f, y0, t0, ts, theta_var, x_r, x_i,0, 1.e-7, 1.e-8, 1000000);
    auto stop = high_resolution_clock::now(); 
    auto duration = duration_cast<microseconds>(stop - start);
    std::cout << "BDF timed: " << duration.count() << "\n";
}

TEST_F(TorstenOdeTest_neutropenia, AM_demo) {
    vector<var> theta_var = stan::math::to_var(theta);
    auto start = high_resolution_clock::now(); 
    vector<vector<var> > y = integrate_ode_adams(f, y0, t0, ts, theta_var, x_r, x_i,0, 1.e-7, 1.e-8, 1000000);
    auto stop = high_resolution_clock::now(); 
    auto duration = duration_cast<microseconds>(stop - start);
    std::cout << "AM timed: " << duration.count() << "\n";
}

Output

mpirun -np 1 test/unit/math/rev/integrate_ode_adams_2_perf_test --gtest_output="xml:test/unit/math/rev/integrate_ode_adams_2_perf_test.xml"
Running main() from lib/gtest_1.8.1/src/gtest_main.cc
[==========] Running 3 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 3 tests from TorstenOdeTest_neutropenia
[ RUN      ] TorstenOdeTest_neutropenia.rk45_demo
rk45 timed: 10968
[       OK ] TorstenOdeTest_neutropenia.rk45_demo (11 ms)
[ RUN      ] TorstenOdeTest_neutropenia.BDF_demo
BDF timed: 4918
[       OK ] TorstenOdeTest_neutropenia.BDF_demo (5 ms)
[ RUN      ] TorstenOdeTest_neutropenia.AM_demo
AM timed: 5416
[       OK ] TorstenOdeTest_neutropenia.AM_demo (6 ms)
[----------] 3 tests from TorstenOdeTest_neutropenia (22 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test case ran. (22 ms total)
[  PASSED  ] 3 tests.

In the context of a Stan program there is an implicit definition of very high and very low tolerance, since the problems should be scaled to unity scale roughly. Only then Stan programs work well. Or do you think that is not meaningful?

You mention that it is a problem for rk45 to output at some specific time points due to the need to exactly hit the time point, but I think our odeint implementation avoids this problem as the integrator we use implements a so called “dense output”:

Dense output steppers also can interpolate the solution to calculate the state x(t’) at any point t <= t’ <= t+dt .

The examples you show look nice, but I wonder how stable AM is against wild parameters being tried. Maybe AM is good as a non stiff solver after running warmup with bdf.

If Stan can’t set these options automatically, then that leaves it to someone writing the model.

Right now the only option seems to be writing Stan models with the different methods and benchmarking them separately.

That isn’t great. I wonder if there is something we can do to smooth the workflow.

How do people pick their ODE methods regularly? What’s the process?

Where do I say that??

I think he’s referring to this comment, where there’s some special difficulty for rk45 near t_0 vs. t_{final}. Wasn’t totally clear to me what was happening there either.

If you mean picking integrate scheme automatically, it’s an ambitious goal that requires many efforts. IMO it’s as “not great” as we cannot choose prior automatically, as both depends on the nature the model. I don’t think there’s any solver suite provides anything beyond very general guidance.

C’mon guys, did you take a look of the ODEs? It’s linear and not that complicated. The largest eigenvalue is ~100, the smallest is ~0.1.

Well I’m happy to give up totally automatically, but right now I think the ODE workflow is:

  1. Figure out in RK45 or BDF is better
  2. Figure out an appropriate tolerance

And the tool for figuring this out is some combination of guessing or running a whole model and looking at the output.

Here’s a hypothetical workflow (dunno if it’s good or not, but I think it’s better than above):

  1. Define priors
  2. Pick a few solver configurations
  3. Sample from the prior and see how solver performance/accuracy varies across the prior
  4. Pick the solver/tolerance settings from this trial
  5. Put them in your model and don’t worry about the solver any more

Yeah but I don’t know what that means lol. Are eigenvalues like this intrinsically good for RK45? Are they bad for adams? Why t_0 and t_{final} and not anything in the middle? Is it stiffness vs. non-stiffness? Is it high order vs. low order? Why would I come to the conclusion that this should run faster with the Adams-Moulton stuff than RK45/BDF?

I thought an RK method would be better for a LMM for the start at least because you don’t need the ODE history to get high order.

I’d put more focus on studying the ODEs: since user’s models almost exclusively come with some physics context,

it’s not too hard to analyze:

  1. The scale of the variables
  2. The trend of the changes
  3. The long term behavior
  4. Any singularities
  5. General expected behavior

It could be a warning to rk45(but not always a warranted warning). They’re not bad to sundials’ AM.

Because ts only consists of those two points and the solution near them are of different scales. It’s nice if one can analyze things in between, but many times user’s observations are sparse and those observations are where analysis should begin.

Not necessarily, large condition number doesn’t always imply stiffness.

It’s a combination of accuracy-controlled stepsize(all three schemes can handle) and stability-controlled stepsize(bdf & am are better at it) and order requirement(am are best at it).

To analyze the stability region of the methods involved and see how the eigenvalues restrict the time step, and to analyze the solutions at T_{\text{final}}. Also to keep in mind how BDF/AM perform order-update & stepsize-update(see cvodes manual).

That’s correct. But CVODES always(unless asked) starts with lower orders so it’s less of a problem.

1 Like