Is there any possibility to use optimization algorithm like L-BFGS in the evaluation of likelihood in Stan? I had a conversation with Professor Andriy Norets from the Department of Economics at Brown during the ISBA meeting. He told me that he tried Stan for one of his researches but found that his model cannot be implemented in Stan. We had follow-up emails these days and he provided me with more details about the obstacles:
Blockquote
Both issues (solving a nonlinear optimization problem and a
combination of hmc and reversible jump) come up in the following
paper: paper link
Abstractly speaking, in this paper, the likelihood function depends on
the parameter through a function, say V(theta), that can be evaluated
by solving a nonlinear equation F(V,theta)=0.
Map F is differentiable so the derivative of V(theta) wrt theta can be
computed by the implicit function theorem. Then we use the matlab HMC
package to simulate from the posterior.
âŚ
Apart from the optimization problem, he also mentioned the difficulty of implementing the âreversible jumpâ in Stan. I think it will be better to focus on one problem in one post so I will skip that part here. I believe that allowing optimization in the likelihood evaluation can be useful for some economic models, but I can also imagine that it would make the automatic differentiation more difficult. Any thoughts on this topic? Thank you!
As far as I can remember, stan is capable of autodiffing through the algebra solver with the implicit function theorem, though some issues with it had to be solved recently so youâll want to be on the latest version
Thatâs rightâStan canât handle different numbers of parameters in the posterior.
Thatâs right. We have an algebraic solver that can solve non-linear equations and it uses the implicit function theorem for autodiff. @charlesm93 and @betanalpha wrote a paper on autodiff with implicit functions.
There were plans to extend that to differential algebraic equations, but I donât know where the devâs at on that.
Indeed Stan supports two algebraic solvers: a Newton solver built around KINSOL and a dogleg solver using one of Eigenâs (unsupported) modules. I recommend using KINSOL. This is not the same as an L-BFGS solver but the automatic differentiation should work on any solver.
The autodiff implementation combines the implicit function theorem and an adjoint method. The latter feature was implemented by @jgaeb, who wrote a very nice blog post on the subject.
There are some extra steps to go from an autodiffed algebraic solver to an autodiffed optimizer (see Section 2.3.3 in Michael and Iâs paper), but if youâre able/willing to hand-code the gradient of your objective function, you should be able to use Stan as is.
More detailsâŚ
These solvers were originally motivated by steady state models in Pharmacometrics. See the StanCon 2018 notebook for an example.
Right. Our goal was to provide a general and formal framework for implicit functions (see this twitter post for an overview).
These concepts also play a role in the adjoint-differentiated Laplace approximation. There, I didnât treat the optimizer as a black box (following the example of Rasmussen & Williams (2006)): terms used during the final Newton step of the optimizer are reused to calculate the gradient. Crucially this avoids computing a Cholesky decomposition (or LU decomposition) which otherwise dominates the computation of the gradient, as shown by @jgaeb (here). Right now, to get those implementation details, you have to dig out Chapter 5 of my thesis.