Is there any possibility to use optimization algorithm like L-BFGS in the evaluation of likelihood in Stan? I had a conversation with Professor Andriy Norets from the Department of Economics at Brown during the ISBA meeting. He told me that he tried Stan for one of his researches but found that his model cannot be implemented in Stan. We had follow-up emails these days and he provided me with more details about the obstacles:
Both issues (solving a nonlinear optimization problem and a
combination of hmc and reversible jump) come up in the following
paper: paper link
Abstractly speaking, in this paper, the likelihood function depends on
the parameter through a function, say V(theta), that can be evaluated
by solving a nonlinear equation F(V,theta)=0.
Map F is differentiable so the derivative of V(theta) wrt theta can be
computed by the implicit function theorem. Then we use the matlab HMC
package to simulate from the posterior.
Apart from the optimization problem, he also mentioned the difficulty of implementing the “reversible jump” in Stan. I think it will be better to focus on one problem in one post so I will skip that part here. I believe that allowing optimization in the likelihood evaluation can be useful for some economic models, but I can also imagine that it would make the automatic differentiation more difficult. Any thoughts on this topic? Thank you!
As far as I can remember, stan is capable of autodiffing through the algebra solver with the implicit function theorem, though some issues with it had to be solved recently so you’ll want to be on the latest version
That’s right—Stan can’t handle different numbers of parameters in the posterior.
That’s right. We have an algebraic solver that can solve non-linear equations and it uses the implicit function theorem for autodiff. @charlesm93 and @betanalpha wrote a paper on autodiff with implicit functions.
There were plans to extend that to differential algebraic equations, but I don’t know where the dev’s at on that.
Indeed Stan supports two algebraic solvers: a Newton solver built around KINSOL and a dogleg solver using one of Eigen’s (unsupported) modules. I recommend using KINSOL. This is not the same as an L-BFGS solver but the automatic differentiation should work on any solver.
The autodiff implementation combines the implicit function theorem and an adjoint method. The latter feature was implemented by @jgaeb, who wrote a very nice blog post on the subject.
There are some extra steps to go from an autodiffed algebraic solver to an autodiffed optimizer (see Section 2.3.3 in Michael and I’s paper), but if you’re able/willing to hand-code the gradient of your objective function, you should be able to use Stan as is.
These solvers were originally motivated by steady state models in Pharmacometrics. See the StanCon 2018 notebook for an example.
Right. Our goal was to provide a general and formal framework for implicit functions (see this twitter post for an overview).
These concepts also play a role in the adjoint-differentiated Laplace approximation. There, I didn’t treat the optimizer as a black box (following the example of Rasmussen & Williams (2006)): terms used during the final Newton step of the optimizer are reused to calculate the gradient. Crucially this avoids computing a Cholesky decomposition (or LU decomposition) which otherwise dominates the computation of the gradient, as shown by @jgaeb (here). Right now, to get those implementation details, you have to dig out Chapter 5 of my thesis.