The problem in the hessian matrix in optimizing function


#1

sd_in_optimizing_3.r (1.5 KB)

Dear all,
Hi

I think that the optimizing function in Stan is fast and better than optim function in R, but the hessian matrix results (and, of course, the standard deviation (sd) of the MLEs) are different.
If X~N(mu, sigma), we can obtain the exact value of MLEs and their sds. By running the attached file, we can see that optim and sampling (in Stan) functions have good sd, but the optimizing function results is unacceptable (of course, only for the sd of the sigma parameter).
Thank you for checking the file and specifying my mistake
.
Best,
-Bistoon


#2
set.seed(100)

n = 2000

x = rnorm(n, 0, 2)
loglike = function(x, par) -sum(log(dnorm(x,par[1],par[2])))

fit1 = optim(par=c(mean(x), sd(x)), loglike, x=x , hessian = TRUE)
fit1
$par
[1] 0.02095186 2.01174852

$value
[1] 4235.386

$counts
function gradient 
      35       NA 

$convergence
[1] 0

$message
NULL

$hessian
         [,1]     [,2]
[1,] 494.1771   0.0000
[2,]   0.0000 987.6151

fit2 = optimizing(mystanmodel, hessian = TRUE, data=list(n=n, x=x), init =  list(mu = mean(x), sigma = sd(x)))
Initial log joint probability = -2397.51
Optimization terminated normally: 
  Convergence detected: relative gradient magnitude is below tolerance
fit2
$par
        mu      sigma 
0.02095186 2.01124552 

$value
[1] -2397.508

$return_code
[1] 0

$hessian
                 mu         sigma
mu    -4.944243e+02  2.247603e-11
sigma  2.247603e-11 -4.000003e+03

I’m not a Stan developer. I don’t see very much difference in parameter estimate in mean and sd of fit1 and fit2, the hessian estimate differs in sigma/sigma. Is it that optimization is calculated at the unconstrained parameter space?

See: " hessian: ‘TRUE’ or ‘FALSE’ (the default): flag indicating whether to
calculate the Hessian (via numeric differentiation of the
gradient function in the unconstrained parameter space)."


#3

sd_in_optimizing_3.r (1.5 KB)

in optim: fit1$hessian[2,2] = 987.6151
in optimizing: fit2$hessian[2,2] = -4000

I changed fit1 as follows and attached new file here:

optim function in R (stats)

loglike = function(x, par) -sum(log(dnorm(x,par[1],par[2])))
lower = c(-Inf, 0)
upper = c(Inf, Inf)
fit1 = optim(par=c(mean(x), sd(x)), loglike, x=x , hessian = TRUE, method = “L-BFGS-B”, lower=lower, upper=upper)

Here, I think fit1 and fit2 have similar parameter space and optimizing method. The optim function seems to be a good result (in sd) and I want to get the same results with the optimizing function, but I have not been able to do this yet.


#4

The Hessian’s being computed on the unconstrained scale so that’d be log(sigma).

The accuracy with finite differences of gradients isn’t great, but it should be in the ballpark to several decimal places.


#5

Here’s the doc for optimizing option hessian:

hessian: ‘TRUE’ or ‘FALSE’ (the default): flag indicating whether to
          calculate the Hessian (via numeric differentiation of the
          gradient function in the unconstrained parameter space).

#6

Bistoon has a point, the hessian looks not ok.

in optim: fit1$hessian[2,2] = 987.6151
in optimizing: fit2$hessian[2,2] = -4000

Exactly in this entry [2,2] of the matrix and the “end”-element too.
Bob, could it be some for(0 … (n-1)) or for(1 … n) coping thing?


#7

No, it is an unconstrained parameterization thing.


#8

I’ve taken my answer in this topic (Optimizing function: incorrect estimation of the standard deviation of MLE). This code is useful (thanks Dr. Goodrich):

fit2 = optimizing(Stan_Normal,hessian=TRUE,data=list(n=n,x=x), init=list(mu=mean(x),sigma=sd(x)),draws=20000)
optimizing_mle = fit2$par;
optimizing_sd_mle = sqrt(diag(cov(fit2$theta_tilde)))
optimizing_sd_mle

-Bistoon