Thank you! The problem is it required the hessian in higher dimensions. That’s why I thought
to get the Levenberg–Marquardt algorithm to work which approximates the Hessian matrix.
https://en.wikipedia.org/wiki/Newton%27s_method_in_optimization#Higher_dimensions
That looks interesting, thanks for sharing. I’ll dig through it and bring it up at the next Stan dev meeting.
It’s already implemented in unsupported Eigen. https://eigen.tuxfamily.org/dox/unsupported/classEigen_1_1LevenbergMarquardt.html
Is that related to L-BFGS (limited memory quasi-Newton), which approximates an inverse Hessian-vector product based on a history of a few gradients? We get cheap gradients with Stan.
Also, if computing a Hessian is a bottleneck, it’s a naturally parallelizable algorithm as each row (or column) can be computed independently. So we might be able to hit some scaling goals by throwing hardware at the problem.