Hi!
Does somebody know if the hessian matrix of the optimizing function in rstan is the hessian matrix of the posterior or the one of the logposterior distribution?
Regards
Robin
Iâ€™ll hazard a guess. The objective function of the optimization is in log space. Iâ€™m almost certain of this because Stanâ€™s distribution functions assume log space and numerical stability requires it. The gradient is estimated based on the objective function so the gradient is of the log objective, and the hessian is effectively the estimate of the gradient of the gradient hence itâ€™s all based on the log space.
Mode and Hessian are for logdensity. Quadratic approximation using mode and Hessian for logdensity corresponds to Gaussian approximation for density (see, e.g. Ch4 in BDA3). Note also that the approximation is made in the unconstrained space. RStan draws from the approximation, computes importance weights wrt to true density used for diagnostics, and transforms draws to the constrained space.
Ok, l put it differently. Is there a possibility that the algorithm finds a point estimate in the unconstrained space? Does the algorithm only search for an estimate in the constrained space?
Iâ€™m not able to parse this sentence and there are too many alternatives to guess. Is there a word missing?
Ok, l put it differently. Is there a possibility that the algorithm finds a point estimate in the unconstrained space? Does the algorithm only search for an estimate in the constrained space?
No, the algorithm works in the unconstrained space because itâ€™s easier there (to put it simply), and when it finds a solution it transforms it back into the original space, and thatâ€™s what you get at the end.
A point estimate in the unconstrained space is also a point estimate in the constrained space respecting the constraints. A mode in the unconstrained space is not necessarily a mode in the constrained space.
ok, thanks. It seems to me that the LBFGS algorithm and the other two are algorithms specifically for unconstrained optimization so that makes sense.
Unfortunately I have some more questions:

The description of the argument draws of the optimizing function says: â€ś(â€¦) how many times to draw from a multivariate normal distribution whose parameters are the mean vector and the inverse negative Hessian in the unconstrained space.â€ť What is meant here with â€śmean vectorâ€ť? The mean of the joint posterior density? Or maybe the point estimate (mode) which was calculated? I think the parameters for normal approximation are the mode and the inverse of the observed information at the mode.

Which type of transformation is used to transform the draws in the constrained space?
The mean of the approximating normal is set to the mode of the logdensity. In multidimensional case the mean is vector valued.
Mean of the normal distribution is also mode of the normal distribution, but is more common to say the parameters of the normal distribution are mean and covariance.