Adams Moulton ODE solver in Stan useful?

Whow…that was very constructive…so let’s park this simply.

Just tried this problem out in the new branch:

rk45 timed: 179593
bdf timed: 203776
adams timed: 64780

And with the new interfaces:

rk45 new timed: 161732
bdf new timed: 197372
adams new timed: 62082

So I tried the adams stuff on a version of the Lotka-Volterra case study and it was much slower than rk45.

Is it obvious why that would be the case (can post code if that seems wrong)?

For a same attainable order of accuracy and same time step, AM is more expensive because it’s implicit.

Well then why is RK45 slower in the problem above?

And docs say the Adams-Moulton method will vary between 1st and 12th order.

Because to attain the solution near t_0 and t_{\text{final}} rk45 needs to reduce time step aggressively, while AM doesn’t have to because it can enlist higher-order to reduce truncation error.

Here’s another example, related to the orbital dynamics that @s.maskell mentioned here. In the ODE system(2body), I’m using the default adams/bdf tolerance, and Adams is faster than the other two. Then I changed the ODE system to a trivial linear one defined by identity matrix, and use a “very low” tolerance, and this time adams is still faster. The example is intended to demonstrate that without involving the nature of ODE system, talking about “high” & “low” tolerance is meaningless, let alone if an integrator “is useful”.

class two_body {
 public:
  template <typename T0, typename T1, typename T2>
  inline std::vector<stan::return_type_t<T1, T2>>
  operator()(const T0& t_in, const std::vector<T1>& y,
             const std::vector<T2>& theta, const std::vector<double>& x,
             const std::vector<int>& x_int, std::ostream* msgs) const {
    if (y.size() != 4)
      throw std::domain_error("Functor called with inconsistent state");

    T1 r = sqrt(y[0]*y[0] + y[2]*y[2]);
    std::vector<stan::return_type_t<T1, T2>> dydt(4);
    dydt[0] = y[1];
    dydt[1] = -y[0]/pow(r, 3.0);
    dydt[2] = y[3];
    dydt[3] = -y[2]/pow(r, 3.0);
    return dydt;
  }
};

TEST(demo, two_body) {
  double t0 = 0.0;
  std::vector<double> ts{1.0, 100.0};
  std::vector<double> theta;
  std::vector<double> x_r;
  std::vector<int> x_i;

  {
    std::vector<stan::math::var> y0{2, 0, 0, 0.5};
    auto start = high_resolution_clock::now(); 
    std::vector<std::vector<stan::math::var>> ys = stan::math::integrate_ode_rk45(two_body(), y0, t0, ts, theta, x_r, x_i, 0, 1.e-10, 1.e-10, 100000);
    auto stop = high_resolution_clock::now(); 
    auto duration = duration_cast<microseconds>(stop - start);
    std::cout << "rk45 timed: " << duration.count() << "\n";
  }

  {
    std::vector<stan::math::var> y0{2, 0, 0, 0.5};
    auto start = high_resolution_clock::now(); 
    std::vector<std::vector<stan::math::var>> ys = stan::math::integrate_ode_bdf(two_body(), y0, t0, ts, theta, x_r, x_i, 0, 1.e-10, 1.e-10, 100000);
    auto stop = high_resolution_clock::now(); 
    auto duration = duration_cast<microseconds>(stop - start);
    std::cout << "bdf timed: " << duration.count() << "\n";
  }

  {
    std::vector<stan::math::var> y0{2, 0, 0, 0.5};
    auto start = high_resolution_clock::now(); 
    std::vector<std::vector<stan::math::var>> ys = stan::math::integrate_ode_adams(two_body(), y0, t0, ts, theta, x_r, x_i, 0, 1.e-10, 1.e-10, 100000);
    auto stop = high_resolution_clock::now(); 
    auto duration = duration_cast<microseconds>(stop - start);
    std::cout << "adams timed: " << duration.count() << "\n";
  }
}

class degen {
 public:
  template <typename T0, typename T1, typename T2>
  inline std::vector<stan::return_type_t<T1, T2>>
  operator()(const T0& t_in, const std::vector<T1>& y,
             const std::vector<T2>& theta, const std::vector<double>& x,
             const std::vector<int>& x_int, std::ostream* msgs) const {
    if (y.size() != 4)
      throw std::domain_error("Functor called with inconsistent state");

    T1 r = sqrt(y[0]*y[0] + y[2]*y[2]);
    std::vector<stan::return_type_t<T1, T2>> dydt(4);
    dydt[0] = -y[0];
    dydt[1] = -y[1];
    dydt[2] = -y[2];
    dydt[3] = -y[3];
    return dydt;
  }
};

TEST(demo, rk45_degen) {
  double t0 = 0.0;
  std::vector<double> ts{1.0, 100.0};
  std::vector<double> theta;
  std::vector<double> x_r;
  std::vector<int> x_i;

  {
    std::vector<stan::math::var> y0{2, 0, 0, 0.5};
    auto start = high_resolution_clock::now(); 
    std::vector<std::vector<stan::math::var>> ys = stan::math::integrate_ode_rk45(degen(), y0, t0, ts, theta, x_r, x_i, 0, 1.e-14, 1.e-16, 100000);
    auto stop = high_resolution_clock::now(); 
    auto duration = duration_cast<microseconds>(stop - start);
    std::cout << "rk45 timed: " << duration.count() << "\n";
  }

  {
    std::vector<stan::math::var> y0{2, 0, 0, 0.5};
    auto start = high_resolution_clock::now(); 
    std::vector<std::vector<stan::math::var>> ys = stan::math::integrate_ode_bdf(degen(), y0, t0, ts, theta, x_r, x_i, 0, 1.e-14, 1.e-16, 100000);
    auto stop = high_resolution_clock::now(); 
    auto duration = duration_cast<microseconds>(stop - start);
    std::cout << "bdf timed: " << duration.count() << "\n";
  }

  {
    std::vector<stan::math::var> y0{2, 0, 0, 0.5};
    auto start = high_resolution_clock::now(); 
    std::vector<std::vector<stan::math::var>> ys = stan::math::integrate_ode_adams(degen(), y0, t0, ts, theta, x_r, x_i, 0, 1.e-14, 1.e-16, 100000);
    auto stop = high_resolution_clock::now(); 
    auto duration = duration_cast<microseconds>(stop - start);
    std::cout << "adams timed: " << duration.count() << "\n";
  }
}

output

mpirun -np 1 test/unit/math/rev/integrate_ode_adams_2_perf_test --gtest_output="xml:test/unit/math/rev/integrate_ode_adams_2_perf_test.xml"
Running main() from lib/gtest_1.8.1/src/gtest_main.cc
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from demo
[ RUN      ] demo.two_body
rk45 timed: 12119
bdf timed: 16825
adams timed: 10083
[       OK ] demo.two_body (39 ms)
[ RUN      ] demo.rk45_degen
rk45 timed: 5143
bdf timed: 5859
adams timed: 2391
[       OK ] demo.rk45_degen (13 ms)
[----------] 2 tests from demo (52 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (53 ms total)
[  PASSED  ] 2 tests.

In this last model, let’s look at something that @charlesm93 had worked on.

struct TwoCptNeutModelODE {
  template <typename T0, typename T1, typename T2>
  inline std::vector<typename stan::return_type<T1, T2>::type>
  operator()(const T0& t_in,
             const std::vector<T1>& x,
             const std::vector<T2>& theta,
             const std::vector<double>& x_r,
             const std::vector<int>& x_i,
             std::ostream* msgs) const {
    using scalar_type = typename stan::return_type<T1, T2>::type;

    const T2& CL    = theta[0];
    const T2& Q     = theta[1];
    const T2& V1    = theta[2];
    const T2& V2    = theta[3];
    const T2& ka    = theta[4];
    const T2& mtt   = theta[5];
    const T2& circ0 = theta[6];
    const T2& gamma = theta[7];
    const T2& alpha = theta[8];

    T2 k10 = CL / V1;
    T2 k12 = Q / V1;
    T2 k21 = Q / V2;
    T2 ktr = 4 / mtt;
    
    std::vector<scalar_type> dxdt(8);
    dxdt[0] = -ka * x[0];
    dxdt[1] = ka * x[0] - (k10 + k12) * x[1] + k21 * x[2];
    dxdt[2] = k12 * x[1] - k21 * x[2];    
    scalar_type conc = x[1]/V1;
    scalar_type EDrug = alpha * conc;

    // x[4], x[5], x[6], x[7] and x[8] are differences from circ0.
    scalar_type prol = x[3] + circ0;
    scalar_type transit1 = x[4] + circ0;
    scalar_type transit2 = x[5] + circ0;
    scalar_type transit3 = x[6] + circ0;
    scalar_type circ = stan::math::fmax(stan::math::machine_precision(), x[7] + circ0);

    dxdt[3] = ktr * prol * ((1 - EDrug) * (pow(circ0 / circ, gamma) - 1));
    dxdt[4] = ktr * (prol - transit1);
    dxdt[5] = ktr * (transit1 - transit2);
    dxdt[6] = ktr * (transit2 - transit3);
    dxdt[7] = ktr * (transit3 - circ);

    return dxdt;
  }
};

with

    ts        {0.083, 0.167, 0.25, 0.5, 0.75, 1, 1.5, 2,3,4,6,8,12,24,36,48,60,72,1000},
    theta     {10, 15, 35, 105, 2, 10, 5, 0.17, 2.0e-4},
    y0        {100.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0}

and this time I’m using a tolerance even lower than the default.

TEST_F(TorstenOdeTest_neutropenia, rk45_demo) {
  vector<var> theta_var = stan::math::to_var(theta);
  auto start = high_resolution_clock::now(); 
  vector<vector<var> > y = integrate_ode_rk45(f, y0, t0, ts, theta_var, x_r, x_i,0, 1.e-7, 1.e-8, 1000000);
  auto stop = high_resolution_clock::now(); 
  auto duration = duration_cast<microseconds>(stop - start);
  std::cout << "rk45 timed: " << duration.count() << "\n";
}

TEST_F(TorstenOdeTest_neutropenia, BDF_demo) {
    vector<var> theta_var = stan::math::to_var(theta);
    auto start = high_resolution_clock::now(); 
    vector<vector<var> > y = integrate_ode_bdf(f, y0, t0, ts, theta_var, x_r, x_i,0, 1.e-7, 1.e-8, 1000000);
    auto stop = high_resolution_clock::now(); 
    auto duration = duration_cast<microseconds>(stop - start);
    std::cout << "BDF timed: " << duration.count() << "\n";
}

TEST_F(TorstenOdeTest_neutropenia, AM_demo) {
    vector<var> theta_var = stan::math::to_var(theta);
    auto start = high_resolution_clock::now(); 
    vector<vector<var> > y = integrate_ode_adams(f, y0, t0, ts, theta_var, x_r, x_i,0, 1.e-7, 1.e-8, 1000000);
    auto stop = high_resolution_clock::now(); 
    auto duration = duration_cast<microseconds>(stop - start);
    std::cout << "AM timed: " << duration.count() << "\n";
}

Output

mpirun -np 1 test/unit/math/rev/integrate_ode_adams_2_perf_test --gtest_output="xml:test/unit/math/rev/integrate_ode_adams_2_perf_test.xml"
Running main() from lib/gtest_1.8.1/src/gtest_main.cc
[==========] Running 3 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 3 tests from TorstenOdeTest_neutropenia
[ RUN      ] TorstenOdeTest_neutropenia.rk45_demo
rk45 timed: 10968
[       OK ] TorstenOdeTest_neutropenia.rk45_demo (11 ms)
[ RUN      ] TorstenOdeTest_neutropenia.BDF_demo
BDF timed: 4918
[       OK ] TorstenOdeTest_neutropenia.BDF_demo (5 ms)
[ RUN      ] TorstenOdeTest_neutropenia.AM_demo
AM timed: 5416
[       OK ] TorstenOdeTest_neutropenia.AM_demo (6 ms)
[----------] 3 tests from TorstenOdeTest_neutropenia (22 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test case ran. (22 ms total)
[  PASSED  ] 3 tests.

In the context of a Stan program there is an implicit definition of very high and very low tolerance, since the problems should be scaled to unity scale roughly. Only then Stan programs work well. Or do you think that is not meaningful?

You mention that it is a problem for rk45 to output at some specific time points due to the need to exactly hit the time point, but I think our odeint implementation avoids this problem as the integrator we use implements a so called “dense output”:

Dense output steppers also can interpolate the solution to calculate the state x(t’) at any point t <= t’ <= t+dt .

The examples you show look nice, but I wonder how stable AM is against wild parameters being tried. Maybe AM is good as a non stiff solver after running warmup with bdf.

If Stan can’t set these options automatically, then that leaves it to someone writing the model.

Right now the only option seems to be writing Stan models with the different methods and benchmarking them separately.

That isn’t great. I wonder if there is something we can do to smooth the workflow.

How do people pick their ODE methods regularly? What’s the process?

Where do I say that??

I think he’s referring to this comment, where there’s some special difficulty for rk45 near t_0 vs. t_{final}. Wasn’t totally clear to me what was happening there either.

If you mean picking integrate scheme automatically, it’s an ambitious goal that requires many efforts. IMO it’s as “not great” as we cannot choose prior automatically, as both depends on the nature the model. I don’t think there’s any solver suite provides anything beyond very general guidance.

C’mon guys, did you take a look of the ODEs? It’s linear and not that complicated. The largest eigenvalue is ~100, the smallest is ~0.1.

Well I’m happy to give up totally automatically, but right now I think the ODE workflow is:

  1. Figure out in RK45 or BDF is better
  2. Figure out an appropriate tolerance

And the tool for figuring this out is some combination of guessing or running a whole model and looking at the output.

Here’s a hypothetical workflow (dunno if it’s good or not, but I think it’s better than above):

  1. Define priors
  2. Pick a few solver configurations
  3. Sample from the prior and see how solver performance/accuracy varies across the prior
  4. Pick the solver/tolerance settings from this trial
  5. Put them in your model and don’t worry about the solver any more

Yeah but I don’t know what that means lol. Are eigenvalues like this intrinsically good for RK45? Are they bad for adams? Why t_0 and t_{final} and not anything in the middle? Is it stiffness vs. non-stiffness? Is it high order vs. low order? Why would I come to the conclusion that this should run faster with the Adams-Moulton stuff than RK45/BDF?

I thought an RK method would be better for a LMM for the start at least because you don’t need the ODE history to get high order.

I’d put more focus on studying the ODEs: since user’s models almost exclusively come with some physics context,

it’s not too hard to analyze:

  1. The scale of the variables
  2. The trend of the changes
  3. The long term behavior
  4. Any singularities
  5. General expected behavior

It could be a warning to rk45(but not always a warranted warning). They’re not bad to sundials’ AM.

Because ts only consists of those two points and the solution near them are of different scales. It’s nice if one can analyze things in between, but many times user’s observations are sparse and those observations are where analysis should begin.

Not necessarily, large condition number doesn’t always imply stiffness.

It’s a combination of accuracy-controlled stepsize(all three schemes can handle) and stability-controlled stepsize(bdf & am are better at it) and order requirement(am are best at it).

To analyze the stability region of the methods involved and see how the eigenvalues restrict the time step, and to analyze the solutions at T_{\text{final}}. Also to keep in mind how BDF/AM perform order-update & stepsize-update(see cvodes manual).

That’s correct. But CVODES always(unless asked) starts with lower orders so it’s less of a problem.

1 Like

These points make sense.

I have trouble turning all of them into action items however.

Like, I know if I look at a variable, and it’s clearly some large, positive number, that relative tolerance is the thing that will matter for it.

Likewise, I know if a variable will be near zero, absolute tolerance will matter. So I can somehow translate scale into things to do in my workflow.

I suppose I can look for divide by zeros and such in the right hand side to find singularities.

However I am still stuck making a decision between RK45, Adams, and BDF for purposes of accuracy, stability, and order requirements. I might be able to pick a tolerance for any given solver, but not the solver itself.

Why is order requirement separate from accuracy?

Well, my suspicion is that people don’t do this manually. Like I don’t even know what the stability region of the CVODES BDF solver would be anyway! I’d have to spent a lot of time with an ODE book and the CVODES doc.

Also I suspect that sampling from a prior and seeing if the solver works is a more reliable method of finding problems that trying to analyze a linearization or something.

I’m more thinking outloud here than anything else. Not trying to come to any conclusion, just trying to get the puzzle pieces to fit on what an ODE workflow might look more like.

There’s also the difficulty of solving ODEs outside of Stan vs. inside of Stan. I wonder if it is difficult for people to generalize what they’ve learned about ODE solvers outside of Stan into Stan? Like do they spend all their time frustrated we haven’t documented our methods better, etc.? Or is it all fine?

\|y_n-y(t_n)\| = Ch^{p+1}y^{(p+1)} + O(h^{(p+2)}) for an order-p method, to improve accuracy one can either increase p or reduce h(stepsize).

The thread seems wondering off the topic of AM solver. Maybe we can do it in a new one. In general I’m fine with the idea of picking solver based on test runs with priors, it’s just I don’t see how it can be a robust and quantifiable process.

I just wanted to ask when it’s on the mind.

It comes back to Adam’s Moulton cuz me and @wds15 both tried it on non-stiff methods and it was slow, so we were missing something.

Sounds like rk45 had good stability properties on the systems we tried things on and so it was fast (and implicit method unnecessary). I don’t know how to know this beforehand.

1 Like

We’ll need a better term than “stability” because it could mean several things. Anyways, in general rk45 is a method choice if one walks into the room knowing nothing about the ODE, and often our ODE are very nice and not even close to stiffness. However, rk45 is not known for stability but

  • medium order(4/5)
  • relatively efficient(FSAL + local interpolation)
  • two orders so friendly to stepsize adjustment(auto step)

and, this is what I suspect why most users choose it

  • being explicit.

Now for stability, rk45 is inferior to rk23(Bogacki–Shampine) but better than Adams-Bashford(higher-order but much less stable), and I’d say if we replace rk45 with Adams-Bashford or even forward-Euler, it’s still going to be people’s favorite because these explicit methods almost always give you something(max_num_step reached! so what? increase it!).

1 Like