Optimization for computing likelihood

Lu.Zhang · July 26, 2022, 5:15am

Hi everyone,

Is there any possibility to use optimization algorithm like L-BFGS in the evaluation of likelihood in Stan? I had a conversation with Professor Andriy Norets from the Department of Economics at Brown during the ISBA meeting. He told me that he tried Stan for one of his researches but found that his model cannot be implemented in Stan. We had follow-up emails these days and he provided me with more details about the obstacles:

Blockquote
Both issues (solving a nonlinear optimization problem and a
combination of hmc and reversible jump) come up in the following
paper:
paper link
Abstractly speaking, in this paper, the likelihood function depends on
the parameter through a function, say V(theta), that can be evaluated
by solving a nonlinear equation F(V,theta)=0.
Map F is differentiable so the derivative of V(theta) wrt theta can be
computed by the implicit function theorem. Then we use the matlab HMC
package to simulate from the posterior.
…

Apart from the optimization problem, he also mentioned the difficulty of implementing the “reversible jump” in Stan. I think it will be better to focus on one problem in one post so I will skip that part here. I believe that allowing optimization in the likelihood evaluation can be useful for some economic models, but I can also imagine that it would make the automatic differentiation more difficult. Any thoughts on this topic? Thank you!

Best,

laifuthegreat · July 26, 2022, 4:45pm

As far as I can remember, stan is capable of autodiffing through the algebra solver with the implicit function theorem, though some issues with it had to be solved recently so you’ll want to be on the latest version

github.com/stan-dev/math

Feature/issue 2401 alg solver adjoint

stan-dev:develop ← jgaeb:feature/issue-2401-alg-solver-adjoint

opened 08:39PM - 12 Mar 21 UTC

jgaeb

+1191 -808

## Summary Fixes Issue #2401. (See the discourse thread [here](https://discou…rse.mc-stan.org/t/algebraic-solver-differentiation-speedup/20845/17).) This pull request changes the implementation of auto diff in the Powell and Newton solvers to more efficiently compute cotangents by replacing matrix inversion with a smaller number of matrix solves. (Parallel changes were made for the fixed point solver, but those will be put into a different PR.) The new solution method was benchmarked against the old solution method across a variety of problem sizes. (See [this comment](https://github.com/stan-dev/math/pull/2421#issuecomment-840968622).) ## Tests Additional tests have been added for the variadic interfaces, and `make_unsafe_chainable_ptr()`. ## Side Effects In the course of making those changes, variadic interfaces (`algebra_solver_powell_impl` and `algebra_solver_newton_impl`) were added for both solvers. ## Release notes Updated Powell and Newton solvers to use an adjoint method to propagate derivatives in reverse mode. Should result in modest speed-up. Added variadic interfaces (`algebra_solver_powell_impl` and `algebra_solver_newton_impl`). ## Checklist - [x] Math issue #2401 - [x] Copyright holder: Johann D. Gaebler The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses: - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause) - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/) - [x] the basic tests are passing - unit tests pass (to run, use: `./runTests.py test/unit`) - header checks pass, (`make test-headers`) - dependencies checks pass, (`make test-math-dependencies`) - docs build, (`make doxygen`) - code passes the built in [C++ standards](https://github.com/stan-dev/stan/wiki/Code-Quality) checks (`make cpplint`) - [x] the code is written in idiomatic C++ and changes are documented in the doxygen - [x] the new changes are tested

Lu.Zhang · July 26, 2022, 5:17pm

I see, sorry for not noticing the algebraic equation solver in Stan and thank you so much for the information!

Bob_Carpenter · July 26, 2022, 8:09pm

That’s right—Stan can’t handle different numbers of parameters in the posterior.

That’s right. We have an algebraic solver that can solve non-linear equations and it uses the implicit function theorem for autodiff. @charlesm93 and @betanalpha wrote a paper on autodiff with implicit functions.

There were plans to extend that to differential algebraic equations, but I don’t know where the dev’s at on that.

charlesm93 · July 27, 2022, 2:54pm

Indeed Stan supports two algebraic solvers: a Newton solver built around KINSOL and a dogleg solver using one of Eigen’s (unsupported) modules. I recommend using KINSOL. This is not the same as an L-BFGS solver but the automatic differentiation should work on any solver.

The autodiff implementation combines the implicit function theorem and an adjoint method. The latter feature was implemented by @jgaeb, who wrote a very nice blog post on the subject.

There are some extra steps to go from an autodiffed algebraic solver to an autodiffed optimizer (see Section 2.3.3 in Michael and I’s paper), but if you’re able/willing to hand-code the gradient of your objective function, you should be able to use Stan as is.

More details…

These solvers were originally motivated by steady state models in Pharmacometrics. See the StanCon 2018 notebook for an example.

Right. Our goal was to provide a general and formal framework for implicit functions (see this twitter post for an overview).

These concepts also play a role in the adjoint-differentiated Laplace approximation. There, I didn’t treat the optimizer as a black box (following the example of Rasmussen & Williams (2006)): terms used during the final Newton step of the optimizer are reused to calculate the gradient. Crucially this avoids computing a Cholesky decomposition (or LU decomposition) which otherwise dominates the computation of the gradient, as shown by @jgaeb (here). Right now, to get those implementation details, you have to dig out Chapter 5 of my thesis.

In any case: a deep and very exciting topic!!

Lu.Zhang · July 27, 2022, 4:20pm

Thank you so much for the information!

Lu.Zhang · July 27, 2022, 4:26pm

Thank you for such a detailed introduction! This is very helpful!

Topic		Replies	Views
Using a “black box” likelihood function Developers	6	1301	August 27, 2019
Custom, MatLab implemented, likelihood function Modeling	2	620	June 7, 2018
Using Stan with PDE-based inverse problems General	5	957	April 8, 2020
Hoping for some guidance / help with implementing custom log likelihood and gradient for research project (details below) General	23	2254	October 18, 2021
Does Stan implement Monte Carlo mle? Modeling techniques	8	1460	February 20, 2018

Optimization for computing likelihood

More details…

Related topics