Dear Stan Community,
Here comes yet another wall of text, related to frustration on
trying to understand “HMC”.
So, here it comes. BUT !
BE ADVISED :
it might not make much sense (might sound mambo-jambo/throwing around ideas/ looking for inspiration).
So, WHY HAMILTONIAN ???
Why not something else ? Say accelleration is a=(F/m)^{0.90} ? => This is not compatible with the Energy conservation. Why to conserve energy ? When in the Metropolis MCMC T is constant, not energy, FOR EXAMPLE, so it REALLY would not matter at ALL (as I “explain” in the next paragraph)!
So, the problem is worse than just some crazy person’s crazy rambling around.
If we use the “wrong Newton equation” in our Leap-Frog integrator (Molecular Dynamics type of algorithm, I did lot of that => hence my obsession with this topic.) then we are not going to conserve energy, big deal. We can calculate the new energy, and accept it / reject it, according to the tradictional Metropolis method.
Ok, just to make it clear what I am talking about. Metropolis in itself is “worthless” - let’s be a bit “drastical” (it is not true but not practical => wortheless, almost, if Stan had only Metropolis then there would be no Stan community).
So, then HMC is basically - as far as I and I OVERSTAND - :) - is Metropolis + microcanonical (energy conserving) MD (molecular dynamics).
So, WHY WHY WHY WHY WHY ???
Why is MD such a big deal ??? Why is the Energy (defined by the Hamilatonian is such a big deal ?). Why HAMILTONIAN at all ???
I am pretty sure the answer is in classical mechanics, buried somewhere, why is Hamiltonian formalism so successfull ? Why not just using the forces and Newton
equations ?
BUT OK, In physics, Hamiltonian is “good” because “energy conservation is a fundamental law”, time translation symmetry, etc, AND WAY BEYOND THAT, Hamiltonian in physics is “THE GOD”.
BUT IN HMC ? For Bayesian inference ??? WHY ???
I mean, HMC was originally invented for physical systems, to calculate physical, macroscopically measurable thermodynamic variables. It had nothing to do with Bayesian inference.
So… why HAMILTONIAN ? Does it really have to be Hamiltonian ?
If yes, WHY ?
:) :) :) :) :)
I have the feeling that the answer is super simple, and most likely it has to do something with weekly interacting systems and perturbation theory (in other words, the statistical problems can be thought of some sort of physical system with independent components, which interact weakly, then the Hamiltonian makes a lot of sense - due to perturbation theory - but this breaks down in strongly interacting systems… so… ??? … bla … :) ).
(But this is just a guess, which I “deducted”/intuitively from Turing Machine/Kolmogorov Complexity/ MDL / Solomonoff type of arguments related to Occam’s razor…)
OK…
In case the answer is obvious to someone, please let me know.
But the question is - in essence - WHY the HAMILTONION ? For a problem where
there is no HAMILTONIAN which describes the observed data points ??? :) :) :)
See ?
Have a good night :)
I feel very confused.
Cheers,
Jozsef