Real world Hamiltonian vs Artificial Hamiltonian for modelling the corresponding real world problem



  • “hand waveing arguments”
  • “speculations”
  • “half awake/asleep/trance state/hot-shower induced intuitions”
  • “3rd year, second semester, 2nd lecture” - grade statistical physics concepts
  • “1st year, second semester, 5th lecture” - grade Classical Mechanics concepts
  • all the “basic ML/CS” PhD level stuff as well, Bishop book and friends

are ahead !!!

This is DANGER ZONE :)

THIS WILL (most likely) make no real SENSE, on PURPOSE.


Recently I came to some “deep” realisation on how “ML”/Bayesian inference is connected to Hamiltonian mechanics ( . Through phase space / information theory and most importantly INDEPENDENCE.

I am writing this post because the above link seems to confirm my intuition that there might be something really awesome insight lurking here, which is not obvious to me, maybe obvious to some of you ??? I do hope. Hence this post. Please, enlight me :)

Now, this stackoverflow question really makes me obsessed to not let go of the question : what is the most optimal choice for a Hamiltonian (let’s call it H_{MCMC}) used for the “MCMC part” for a system which is described by a “real world Hamiltonian” (let’s call it H_{real-world}). (x_i=1 <== checking Latex compatibility)

Given H_{real-world}, how can I find the most optimal H_{MCMC} that “solves a Bayesian inference problem” on data which was generated by a dynamical system (let’s denote it by S_{real-world}) whose equation of motion is defined by H_{real-world}, and same samples were taken according to the “Ergodicity principle” and / or “replacing ensemble average by time average” concepts.

But for now, let’s stick to the microcanonical (constant energy) ensemble.

I have the “feeling” that knowing the underlying Hamiltonion of the "to be modelled REAL-WORLD system, which is ultimately dynamic in nature ( hence EVERY data is dynamic in nature, no matter if it was generated by a Turing machine or by “the real world” ).

So my feeling is that knowing the equations of motions for the real world problem could provide some hints for “optimal Hamiltonian / sampling / whatnot” for the Hamiltonian used for the actual Stan calculation,where the data is simply datapoints in the phase space of a microcanonical ensemble with N degrees of freedom.


I don’t expect any real answers, just “gut feelings”, “speculations”, “collaborative daydreaming”. Just the typical conference discussion after a few cookies / beers in Amsterdam after the conference dinner.




I’d strongly suggest reading Michael Betancourt’s “Conceptual introduction to Hamiltonian Monte Carlo” (on arXiv).

If you’re going to stick with negative log density as potential, you only have the freedom to change the kinetic energy distribution. There’s a really nice paper on this which follows on nicely from Betancourt’s:

Then you can complete the set by reading Livingstone and Betancourt’s paper on geometric ergodicity :-)

1 Like

Hi Bob,

Thanks, indeed. That is true.

Also, it is interesting that this is a very recent paper!

Hmm, geometric ergodicity … wow, these are very addictive paper to a guy with theor-condmat-phys backround.

Difficult to resist to get deeply lost in them.

Of course, “stupid” question, but the potential energy can be expressed also in “many forms”, depending on the choice of “coordinates”.

Somehow this HMC seems to be linking “physics” to “ML” very strongly.

Very difficult to resist not getting too drawn into this thoughts too deeply. Nevertheless, somehow this connection is very … “underrated” ? I have the feeling. If there is any…

Nevertheless, I need to take this thinking a bit easy but somehow I have the feeling that HMC turns “ML” problems into “stat-phys” problems.

I am pretty sure that after reading the literature on this, it will turn out that people have thought about this a lot already.


Ok, enough hand waveing. Thanks for the tip @Bob_Carpenter !

These papers are keepers.

and I need to watch some lectures on classical mechanics … again… it was in 1999 when I took that course, I think, tbh. I did not get the point :( - maybe now I will :)



Thanks @maxbiostat ! J.


I had a super super quick look, actually a Search. I searched for “lagr”, in all three papers. To my surprise - no match. Well, maybe this is also a question - what the “Lagrangian formalism” means / not means for HMC. If it means anything. I am sure there is literature already on this too, plenty even, possibly, but, who knows.

It’s always nice to pull some techniques from one field to the other… “if it fits”. After all, there is “nothing new under the sun”. :) Or there is.

Maybe I will look into this one day. Or ask a few ppl who are not on this forum, if/when I meet them.

Just some a quick random thought. Maybe someone finds it interesting.

These are very “difficult to let go” questions. :)