I just wanted to complement the discussion with some formal statements.
Markov chain Monte Carlo, including Hamiltonian Monte Carlo, is a stochastic algorithm. In theory each time we run the algorithm we should generate a different Markov chain and hence different Markov chain Monte Carlo estimators. Assuming that the Markov chain Monte Carlo algorithm is reasonably well-behaved these estimators will converge to the same answer only for infinitely long Markov chains. In other words even in theory realized finite Markov chains, and the subsequent estimators, are not reproducible!
Although the finite-length behavior isn’t deterministically reproducible, for especially well-behaved Markov chain Monte Carlo algorithms it can be characterized probabilistically using a Markov chain Monte Carlo central limit theorem. This is why we have to check so many diagnostics – the useful diagnostics indicate obstructions to that central limit theorem and our ability to quantify the behavior of expectation value estimators from finite-length Markov chains.
What does this say about reproducibility? Let’s say that we have two realized Markov chains, generated either from the same algorithm or different algorithms. If all of the algorithms satisfy Markov chain Monte Carlo central limit theorems then the difference between the Markov chain Monte Carlo estimators for the expectation value \mathbb{E}_{\pi}[f] is approximately normally distributed,
\hat{f}_{1} - \hat{f}_{2} \sim \text{normal}(0, \sqrt{ \text{MCMC-SE}_{1}[f]^{2} + \text{MCMC-SE}_{2}[f]^{2} } ),
where \text{MCMC-SE} refers to the Markov chain Monte Carlo standard errors for the estimates. These are estimated along with the Markov chain Monte Carlo estimators themselves; for example in Stan they are displayed in the print
/stansummary
function outputs. In summary Markov chains, and subsequent Markov chain Monte Carlo estimators, are not deterministically reproducible but we can quantify their variation probabilistically.
All of these considerations are in theory. In practice the stochastic generation of Markov chains is determined by a pseudo-random number generator, which itself is initialized with a seed (although most seeds are actually pretty bad; see for example the discussion in Section 2.2.2 of Rumble in the Ensemble). If the computer hardware, operating system, and code are all fixed then the pseudo random number generator output, and hence the generated Markov chains, should also be fixed and exactly reproducible.
Even small changes to the hardware, operating system, or code, however, can have substantial changes to the pseudo random number generator output and Markov chain output. Indeed the better designed the pseudo random number generator and Markov chain Monte Carlo algorithm the more sensitive the outputs will be to small changes! In practice all of these changes are essentially equivalent to changing the pseudo random number generator seed. If we can’t control everything then we effectively cannot control the seed, and all we can expect in terms of reproducibility is the probabilistic variation afforded by the Markov chain Monte Carlo central limit theorem (and then only if the central limit theorem actually holds!).