I’ve noticed that when running multiple chains, there will often be a few that are done quite fast while there’s one that takes much longer to run compared to the rest. In addition, the time interval each iteration requires seems to be approximately constant throughout the process.

I recall seeing a talk by Michael Betancourt where he explained that HMC finds the typical set of the distribution and then we get our samples as the algorithm traverses it. It’s also my understanding that the algorithm is able to adjust certain parameters on its own in order to be as efficient as possible.

Intuitively, I would assume that the difference in chain times could be attributed to different starting points, but I’m not sure why once a typical set is found it would still be showing such heterogeneity between chains and why iteration time wouldn’t be reduced as the algorithm “maps out” the space.