Newton solver in Stan

Are you talking about this PR?

There were just a few for loops that could be removed. Did that goof something on your end?

That PR should show the differences, but the main thing was just being a bit more careful with looping over the Eigen matrices. It also uses the new .val() and .adj() methods for Eigen matrices to pull out the values and adjoints of matrices of vars/fvars

1 Like

That’s not the PR I had in mind (though, great PR!). The last change to algebra_solver.hpp is from a commit described as “replace typedef with using and add nullptr”. I’ll check out the PR and pin you if I have questions.

Ah! Yeah so that was just replacing things in the code that were using the typedef keyword with the using keyword. That’s just stylistic. For everything else in C++ the name and type declaration are on the right hand side of the assignment aka using namey = my_thing, but typedef’s are one of the few places where that’s the opposite ala typedef my_thing namey. When declaring types, using and typedef are the same keyword with using having a few niceties around it

Using raw NULL or 0 can have some very weird and esoteric side effects with templates and function calling so it’s nicer to use nullptr. You can see a further 'splainer here

1 Like

While this is in review, I’d like to tag @Teddy_Groves as he has shown in StanCon that a model of his might take advantage of KINSOL, the sparse fixed point solver part in particular. @Teddy_Groves, anywhere we can find your presentation? Not that it’ll be available soon but your model could be a good motivation.

3 Likes

Hi! You can find my model, notebook and presentation in this repostitory. The relevant Stan code is here where I’ve tried to implement the method @charlesm93 recommended to me at the StanCon for finding steady states.

The problem it uses the algebra solver for is to find steady state metabolite concentrations for a network of metabolic reactions, given a configuration of kinetic parameters which (together with the metabolite concentrations) determine the reactions’ rates. It is sparse in the sense that a metabolite concentration’s rate of change depends only on the rates of a few reactions (the ones that consume and produce the metabolite), which are known in advance. My team thinks it should be possible to exploit this information to speed up the steady state calculation.

Hopefully the binder button in the repository readme should let you view and run the notebook - let me know if not!

3 Likes

@Teddy_Groves1 thanks for sharing!

It’s good to see an application, and it give us another application to compare various algorithms. Once the above PR is approved, we can change a line of code, i.e. algebra_solver() --> algebra_solver_newton(), and see if it yields performance gain (I’m very confident it does). Then will do more fancy things, as this moves on.

1 Like

I thought I’d share this here.

We have one unit test where the Newton solver fails (while the dogleg solver does fine). The function at hand is:

f_1 = (x_1 - y_1)(x_2 - y_2) \\ f_2 = (x_1 - y_2)(x_2 - y_1)

This is a degenerate problem with two roots: x^*_a = (y_1, y_1) and x^*_b = (y_2, y_2). The initial guess should determine which root the solver converges to. Specifically, y_1 = 5 and y_2 = 100. If x_0 = (1, 1), then the solver finds the first solution. If x_0 = (80, 80) the solver finds the second solution. So the initial guess determines the neighborhood of the solution.

Now if x_0 = (5, 100), I get an error from the solver, namely flag -11: The linear solver’s setup function failed in an unrecoverable manner. If I use QD (finite differentiation) to calculate the Jacobian at each step, I get flag -5:The linesearch algorithm was unable to find an iterate sufficiently distinct from the current iterate. My guess is that the first element of x_0 is so far from the solution – alternatively, the second element is far from the other solution --, that the solver struggles to move smoothly towards the solution. Basically, this is a non-convex problem and it might not be surprising that the Newton solver struggles with this.

Any other ideas as to what might be going on? Also, how concern are we about this behavior – i.e. is this a unit test we want to keep in there?

I’m fine if Newton solver is guaranteed to work only in convex case. We could have another solver for non-convex case. Note that Laplace approximation can produce insensible results in non-convex case even if the solver would work, and the marginal likelihood is likely to have discontinuities if there are more than one solution.

1 Like

Do you know why the two approaches emit different error msg? “unrecoverable” msg is as much as informative as no msg at all, and it’ll be difficult to diagnose when the model is complicated. This is my main concern in the gh review, as QD approach’s msg should be consistent with AD’s.

Is the great gradient non-zero and finite at e starting point?

J = \left [ \begin{matrix} x_2 - y_2 & x_1 - y_2 \\ x_2 - y_1 & x_1 - y_2 \end{matrix} \right ]

so at x_0 = (5, 100) and y_1 = (5, 100), you have

J = \left [ \begin{matrix} 0 & 0 \\ 95 &-95 \end{matrix} \right ]

but the Jacobian doesn’t help you much since you’re at a midpoint between the two roots.

I’m not sure.

My guess (based on browsing the KINSOL manual) is that the setup function is only used to “preprocess” Jacobian data when the user specifies a Jacobian function. A different approach is used for linear solve with the default QD.

We could have another solver for non-convex case

@avehtari: note that the dogleg solver (currently in Stan) passes this test.

I just ran the test

TEST_F(degenerate_eq_test, debug) {
  Eigen::VectorXd theta = degenerate_test(y_scale, x_guess_3, true);
}

with KINSetJacFn commented out. I’m assuming this gives me jacobian calculated from DQ, and I’m still getting -11, not -5 as you observed. Am I doing it wrong?

From optimization point of view, the initialization is a saddle singular point for some scalar function, so Newton should have trouble. What I’d like find out is what causes different error message in different Jacobian calculation method.

@yizhang I definitely get flag -5, after commenting out KINSetJacFn. Maybe you modified other arguments in kinsol_solver. For example, if I set steps_eval_jacobian = 1, I get flag -11 (which is simply a wild guest for what might be happening).

Send me your code, I’ll run it on my machine.

See the branch

with debug test

bash-3.2$ test/unit/math/rev/mat/functor/algebra_solver_newton_debug_test
Running main() from lib/gtest_1.8.1/src/gtest_main.cc
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from degenerate_eq_test
[ RUN      ] degenerate_eq_test.debug
unknown file: Failure
C++ exception with description "algebra_solver failed with error flag -11." thrown in the test body.
[  FAILED  ] degenerate_eq_test.debug (0 ms)
[----------] 1 test from degenerate_eq_test (0 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (1 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] degenerate_eq_test.debug

 1 FAILED TEST

I recover your result when running your test.

That said, if I run the following test (i.e. the original test), appended to your script, I get flag -5:

TEST_F(degenerate_eq_test, newton_dbl) {
    bool is_newton = true;

    // This first initial guess produces the
    // solution x = {8, 8}
    Eigen::VectorXd theta = degenerate_test(y_dbl, x_guess_1, is_newton);
    EXPECT_FLOAT_EQ(8, theta(0));
    EXPECT_FLOAT_EQ(8, theta(1));
    
    // This next initial guess produces the
    // solution x = {5, 5}
    theta = degenerate_test(y_dbl, x_guess_2, is_newton);
    EXPECT_FLOAT_EQ(5, theta(0));
    EXPECT_FLOAT_EQ(5, theta(1));
    
    // See if the initial guess determines neighborhood of the
    // solution, when solutions have different scales,
    // using y_scale.
    theta = degenerate_test(y_scale, x_guess_2, is_newton);
    EXPECT_FLOAT_EQ(5, theta(0));
    EXPECT_FLOAT_EQ(5, theta(1));
    
    theta = degenerate_test(y_scale, x_guess_3, is_newton);
    EXPECT_FLOAT_EQ(100, theta(0));
    EXPECT_FLOAT_EQ(100, theta(1));
    //
    // NOTE: causes Newton solver to send a flag -11 error.
}

Still from your branch, if I run the regular test (algebra_solver_test.cpp), I get flag -5 for the degenerate_eq_test. Plus a whole bunch of errors on other tests – my guess is QD doesn’t do well for some of the systems.

Now, what’s happening is that the previous calls to algebra_solver_newton() are affecting the next calls. You can play around with it, by commenting out some of the degenerate_eq_test(). You’ll sometimes get flag -11 or flag -5. This suggests there may be some memory leak, and this is what I’m looking at now.

This suggests there may be some memory leak, and this is what I’m looking at now.

@yizhang: Nothing jumps out. I think you caught all the leaks during your review of the code.

It’s not necessarily memory leak. If the implementation keeps states this could happen.

Just a hunch, have you identified which test emits -5 and which emits -11? I’m asking because you have multiple tests in a single unit test, and any one of them could throw exception when you switch from AD to DQ. The unit test I was using has a single test and it matches what AD gives.