Continuing with optimization using lbfgs/bfgs when optimization is terminated

Hi all

I just wanted to know if it possible to continue with an optimization when some stopping criteria is achieved (in this case the maximum number of iterations)? Practically speaking we should able to just use the final iteration as initialization for a new thread, which should work for simple models. When we however consider more complex models, specifically models with a large number of parameters, the optimization process always fails. The initial log probability value is computed but it fails to find the next point. I do not know the maths in detail which goes into the optimization when using lbfgs, but based on my limited understanding we need an initial hessian approximation which then updates after each timestep? Is there maybe a way in which I can firstly obtain the final approximation of the hessian and secondly give this as an initial approximation to cmdstanpy? Or is there maybe some other way in which I am able to achieve this task?

I’m afraid not. L-BFGS works by creating a low-rank approximation of the Hessian implicitly using the last J gradient evaluations and then adding a diagonal to make it positive definite. There’s no way to initialize the algorithm with this information. It’s also updated with every point visited.

If the task is having optimization complete, then it’s usually a problem of finding a reasonably stable starting point and parameterizing the problem so that the target doesn’t get too out of hand.

You can use BridgeStan to evaluate Hessians and see what conditioning looks like.

1 Like

Thanks Bob. I will look at this.

What I have noticed is that for my model (high dimensional, and non-linear) optimization typically fails from random starting points. What I am currently doing is to use a fairly short sampling chain (from random points) and use the sample with the maximum target as the inits to the optimization method. When I do this, the target is fairly large at this initial position, typically of the order of 1e7-1e9 (negative). This then increases to about the order of 1e4 (negative). When the number of iterations is exceeded at this point, and we try to use the final iteration of the optimization as inits to a new thread, (to continue optimization) then the lbfgs method fails every time.

The increases I see in the target is also fairly small, varying from 0.1-100 per iteration and the || dx || parameter from lbfgs being really small, 1e-6. What I initially thought was the optimization should be close to the local optimal, however these small changes in the target keeps on accumulating.

Another thing to point out is that I tried to initialize newton (with the final iteration of lbfgs), however, the first iteration from newton decreases the target again (on top of being significantly slower due to the the calculation of the hessian). I assume that the newton method in stan fails in this case and uses a type of line search algorithm (similar to bfgs) to find a new initial point fairly close to the given inits, even though that point decreases the target?

But as you mentioned, looking at the conditioning of the hessian should be a good place to start to diagnose problems.