We’re trying to get our heads round how to estimate a good mass matrix for an SMC variant of Stan. We are therefore trying to understand how Stan currently identifies the mass matrix to use.
It appears that Stan currently estimates the covariance of the samples (modulo assuming that matrix is diagonal or full-rank etc) and uses this estimated covariance as the inverse mass matrix. That seems sensible enough.
However, we had wondered whether it would be better to use the evaluations of the gradient of the log posterior at the sample points (which have already been calculated) to estimate the Fisher information matrix (as the expectation of the gradient multiplied by the gradient transpose). Is there a reason why that would be a (good or) bad idea? I sense that there may be an issue that we haven’t spotted and I’d like to be aware of the quagmire before we walk into it!
Thanks
Simon