Sampling with Hessian, not just gradient?

fan455 · April 30, 2024, 9:30am

Hi, I have a log density with 30-50 dimensions. Though the normalizing constant of this density is intractable, the density is convergent. The density is fairly regular, being unimodal and asymptotically normal. The gradient and Hessian are both well-defined and can be computed analytically. Evaluating the log density and gradient is not too expensive, and evaluating the Hessian using numeric differentiation is still feasible.

To do inference about this probability density function, I tried to use Stan NUTS (10000 warmup and 10000 samples) and compute some relevant percentages of hypothesis values (like H0: beta=0). From many runs of NUTS using different random seeds for tge same model, the mean values of drawn samples are relatively stable, but the computed percentages of hypothesis values seem to be only stable at 0.01 level. The quality of kernel density estimation plots of NUTS samples also vary. Around 50% of the density plots look close to normal distributions (as my model is asymptotically normal), and others may have some wierd shapes that deviate from normal distributions.

As far as I know, NUTS is gradient based, sharing some similarities with the gradient descent method in optimization problems. There’s another class of Newton and quasi-Newton optimization approaches using both the gradient vector and the Hessian matrix. It seems that using Hessian eliminates the need to choose a proper stepsize, which is difficult and less accurate.

So I wonder if there are any Hessian based Monte Carlo sampling methods that does not need to tune or adapt the stepsize, maybe only need to adapt the covariance of multivariate gaussian noise? Maybe for relatively regular densities and small number of dimensions like my case, using Hessian for sampling could improve inference accuracy with acceptable computational costs?

Special thanks Stan developers!

Topic		Replies	Views
New Algorithm: Newtonian Monte Carlo paper Algorithms mcmc	7	2128	February 7, 2020
What (exactly) happens for only piecewise continuous posterior densities, gradients or wrong gradients? Algorithms	4	1288	April 26, 2022
New algorithm: Gradient-based Adaptive Markov Chain Monte Carlo Algorithms mcmc	13	2103	February 16, 2022
Walking through NUTS code Developers	1	625	December 20, 2016
New user's experience with the Stan Math Library General	1	403	August 15, 2022

Sampling with Hessian, not just gradient?

Related topics