One important detail – every iteration of Markov chain Monte Carlo is constructed from a numerical trajectory. In other words if you run with 1000 iterations then Stan’s dynamic Hamiltonian Monte Carlo will generate 1000 iterations, keeping only one point from each that is saved in Stan’s output. Each Markov chain will then work through an entire ensemble of different trajectories.
The deterministic, numerical trajectories are generated with a symplectic leapfrog integrator. For more see [1701.02434] A Conceptual Introduction to Hamiltonian Monte Carlo, specially Section 5.1.
This is not uncommon. If individual component models are misbehaving then the joint model will also misbehave, but if all the individual component models work okay on their own the joint model can still misbehave. In that case one the source of the problem is in the interaction between the component models.
No I mean that the posterior density function remains non-zero even when the parameter values are infinitely large instead of decaying to zero.
Stan handles constraints by transforming the constrained parameter space to an unconstrained space without any boundaries that need to be considered. The Hamiltonian Monte Carlo sampler explores this unconstrained space and then the output is mapped back to the constrained space before being returned to the user. If the posterior distribution concentrates away from the constraint boundaries then there shouldn’t be much of a performance difference between exploring the constrained and unconstrained spaces, but if the posterior concentrates anywhere near at the constraint boundary then you can see substantial differences.
Some Markov chains running super quickly without actually moving suggests that the adaptation for those chains fell into a bad configuration, usually small step sizes and very large inverse metric elements. You can investigate this using the get_adaptation_info
functions in RStan and PyStan.