I figured out that for the Euclidean metric in HMC hyperparameter, there are choices like (“unit_e”, “diag_e” and “dens_e”), and the choice of the metric influences the convergence of the chains a lot. I read the STAN manual and found out that the “diag_e" and “dens_e” are estimated using some regularization method. I am super curious about what kind of regularization is used?

We take a penalized maximum likelihood point estimate. We just shrink the diagonal estimates toward zero and shrink the covariance matrix toward the diagonal. Nothing fancy. There is a bit more information in the reference manual, but the actual values of the shrinkage and regularization are in the code and not user configurable:

In the future, it’d be nice to have a fourth option where we use a low-rank plus diagonal approximation of covariance. The problem with the dense case is that it makes the leapfrog steps \mathcal{O}(N^2) in dimension due to generating a multivariate normal and it requires at least one \mathcal{O}(N^3) operation to Cholesky factor. I believe there may be more efficient low-rank updates we could apply directly to the Cholesky factors, but we’re not doing that now. With low-rank plus diagonal, those times go to \mathcal{O}(N \cdot J) and \mathcal{O}(N \cdot J^2) for rank J \ll N.

Could you also point me to some papers about the penalized maximum likelihood method? Is that something like graphical LASSO? And is the shrinkage value selected by some criteria, or is it fixed for all the situations?