L-BFGS-B comment

Bob_Carpenter · October 11, 2016, 5:33am

Technically, our optimizer finds penalized maximum likelihood estimates.
They’re not MAP estimates because we throw away the Jacobian in the
implicit priors on the transformed parameters. We’ll probably be
adding a proper MAP estimator soon.

Andrew likes to think of all these methods as approximate Bayes.
The penalized max likelihood gives you a Laplace approximation from
which you can gauge uncertainty in the same way as variational
inference, with the difference being centering on the mode vs.
an approximate mean.

There are lots of cases where optimization won’t work in theory
(because there’s no MLE as in a typical hierarchical regression
model) but variational inference would work (because there
is a posterior mean).

I tend to think of “stochastic” methods as meaning ones that
stream over data, like stochastic gradient descent (which can
be deterministic if you don’t randomize mini-batches).

I’m not sure what you mean by needing stochastic optimization
methods. The ADVI paper also talks about using stochastic
variational inference (in the streaming data sense), hence my
confusion about what we’re talking about.

I didn’t go into details, but that BB method you cite sounds
like it has the same motivation as the L-BFGS method we currently
use, which also approximates inverse Hessian-vector products using
gradients without computing full Hessians.

If comments are truly noise, we just ignore them.

Bob

Topic		Replies	Views
Under what conditions should I expect stan's optimization to be deterministic? General	2	890	January 8, 2020
Optimization for computing likelihood Developers techniques	6	550	July 27, 2022
Hessian matrix of the optimizing function RStan rstan	9	1579	February 1, 2020
Optimization / adaptive importance sampling loop for rstan General	6	1021	August 24, 2018
Stanc3 optimization and analyses walkthrough during StanCon Meetings	6	1073	August 22, 2019

L-BFGS-B comment

Related topics