Newton solver in Stan



Stan’s current method for solving systems of (nonlinear) algebraic equations is Powell’s dogleg method, as implemented in Eigen. There have been a few examples where Newton’s method is, up to an order of magnitude, faster. Examples include Precise but wrong results for a drug interaction model and Algebraic sovler problems (Laplace approximation in Stan), plus some other experiment I conducted myself.

This warrants creating a new solver. I’ll detail out the difference between the two solvers, along with recommendations on when to use which. Before opening an issue on GitHub, I however wanted to start a conversation here. The way I see it, Newton’s solver would have the same signature as Powell’s solver. We may however want to add other options, such as an adaptive step size.


Is this a different Newton’s method than the optimizer we currently have? If it requires second derivatives, how do you propose to get them?

I don’t know who the best person is to talk to about these algorithms. If it were me, I’d ask @Marcus_Brubaker, @betanalpha, and @bgoodri.


Same idea but @charlesm93 is talking about something to use inside a Stan program rather than applied to a Stan program.


OK—any way we could use a common implementation. One of the things I disliked about the ADVI implementation was the embedded implementation of an optimizer. I’d prefer to pull out a single implementation to maintain, even if it needs slightly different interfaces for optimizing a Stan program and whatever the use is here.


When solving an algebraic equation, Newton’s solver is a first-order method. It is a second order method for an optimization problem, because the algebraic equation is constructed by taking the derivative of the objective function.

I can look at the Newton solver used for ADVI and link it to the new solver.


+1! Charles: Maybe you and @betanalpha and me should get together next week and have a quick look at how we should build a more unified algorithms design (so that the non-HMC stuff is at least as well designed as the HMC stuff)

What do you mean by “second order” here? It still has quadratic convergence (locally) in both cases (which is the usual meaning of second order in numerical analysis)


Oups, mixed up some technical terms. I’m talking about the order of the derivative that needs to get computed.

And yes, happy to meet up next week, and work this, and other things out, with Michael.


Cool! Sorry I got confused! I’m there on Wednesday and we can chat to work out a time then.


The “order” in this context has been confusing us for years!