Hessian matrix of the optimizing function

RobinLikelihood · January 30, 2020, 9:48am

Hi!
Does somebody know if the hessian matrix of the optimizing function in rstan is the hessian matrix of the posterior or the one of the log-posterior distribution?
Regards
Robin

emiruz · January 30, 2020, 10:04am

I’ll hazard a guess. The objective function of the optimization is in log space. I’m almost certain of this because Stan’s distribution functions assume log space and numerical stability requires it. The gradient is estimated based on the objective function so the gradient is of the log objective, and the hessian is effectively the estimate of the gradient of the gradient hence it’s all based on the log space.

avehtari · January 30, 2020, 12:28pm

Mode and Hessian are for log-density. Quadratic approximation using mode and Hessian for log-density corresponds to Gaussian approximation for density (see, e.g. Ch4 in BDA3). Note also that the approximation is made in the unconstrained space. RStan draws from the approximation, computes importance weights wrt to true density used for diagnostics, and transforms draws to the constrained space.

RobinLikelihood · January 30, 2020, 5:54pm

Ok, l put it differently. Is there a possibility that the algorithm finds a point estimate in the unconstrained space? Does the algorithm only search for an estimate in the constrained space?

avehtari · January 30, 2020, 6:47pm

I’m not able to parse this sentence and there are too many alternatives to guess. Is there a word missing?

RobinLikelihood · January 31, 2020, 12:59pm

Ok, l put it differently. Is there a possibility that the algorithm finds a point estimate in the unconstrained space? Does the algorithm only search for an estimate in the constrained space?

mcol · January 31, 2020, 1:15pm

No, the algorithm works in the unconstrained space because it’s easier there (to put it simply), and when it finds a solution it transforms it back into the original space, and that’s what you get at the end.

avehtari · January 31, 2020, 1:38pm

A point estimate in the unconstrained space is also a point estimate in the constrained space respecting the constraints. A mode in the unconstrained space is not necessarily a mode in the constrained space.

RobinLikelihood · January 31, 2020, 1:46pm

ok, thanks. It seems to me that the L-BFGS algorithm and the other two are algorithms specifically for unconstrained optimization so that makes sense.
Unfortunately I have some more questions:

The description of the argument draws of the optimizing function says: “(…) how many times to draw from a multivariate normal distribution whose parameters are the mean vector and the inverse negative Hessian in the unconstrained space.” What is meant here with “mean vector”? The mean of the joint posterior density? Or maybe the point estimate (mode) which was calculated? I think the parameters for normal approximation are the mode and the inverse of the observed information at the mode.
Which type of transformation is used to transform the draws in the constrained space?

avehtari · February 1, 2020, 1:03pm

The mean of the approximating normal is set to the mode of the log-density. In multidimensional case the mean is vector valued.

Mean of the normal distribution is also mode of the normal distribution, but is more common to say the parameters of the normal distribution are mean and covariance.

See Stan User Guide 21.3 Changes of Variables

Topic		Replies	Views
Hessian matrix for transformed parameters in optimizing function RStan optimization	1	991	April 30, 2020
The problem in the hessian matrix in optimizing function General	7	3355	November 6, 2017
L-BFGS-B comment Algorithms optimization	23	4522	October 13, 2016
What is stan_glm(algorithm = "optimizing") doing? rstanarm	1	562	September 7, 2020
Access Hessian from optimize method in CmdStanR CmdStan cmdstanr	19	1689	May 16, 2024

Hessian matrix of the optimizing function

Related topics