New Algorithm: Newtonian Monte Carlo paper

Stripping away their unfortunate notation what they propose is a Metropolis-Hastings within block Gibbs sampler using a regularized Hessian as a local Gaussian approximation. The only novel contribution is using the conditional mean of that Gaussian approximation to define the step (similar to “Riemannian” Langevin methods) instead of taking a random Gaussian sample. They don’t go into many details, but because they define a deterministic proposal they would need to do some very careful corrections in order to get a well-defined Metropolis-Hastings correction. For comparison the “Riemannian” Langevin methods have noise around the gradient update and still requires a nasty Hastings ratio in the acceptance probability.

Regardless of those details, the previous comments about the poor evaluations metrics are spot on. To avoid underlying implementation differences they would need to compare something like effective sample size per equivalent gradient evaluation, where equivalent gradient evaluation would be something like N * iterations for this new method to account for the cost of computing the Hessian. The fact that they are using GPUs on very parallelizeable examples and only getting a factor of 5 improvement in effective sample size per walltime does not bode well for the actual performance of the method.

On a side note, I’m not sure what the default “Pyro” method is, but the fact that it’s that slow compared to Stan-on-CPU is honestly pretty shocking.