I am trying to use marginalization to draw some discrete parameters. After compiling correctly, I begin to sample those parameters. However, I got this message:
Chain 1: Gradient evaluation took 0.784 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 7840 seconds.
Chain 1: Adjust your expectations accordingly!
I was wondering if this means if I set the iteration=4000, this computation time will be roughly around 4 X 7840 seconds.
So typically, can we give a rough estimate of computation time when gradient evaluation speed is somewhat slow? If yes, how do we do that?
This will be true if and only if your model ends up requiring approximately 10 leapfrog steps per transition. It’s not uncommon for models to require an order of magnitude more leapfrog steps. Unfortunately, the early part of warmup does not provide reliable estimates for the number of leapfrog steps per transition in the later part of the model. Unless you have some other way of building intuition about how computationally “hard” your model should be, it’s often best to just hit run and try to feel out the model behavior that way.
In general, the number of leapfrog steps required is related to the step-size adaptation of the model. Models that are able to utilize larger step-sizes during sampling require fewer leapfrog steps to simulate sufficiently long Hamiltonian trajectories for efficient HMC sampling.
2 Likes
Thank you for the explanation. I had been running the code for quite a while and try to figure out if the computation time is matched with referenced estimation. But I got your point!
A transition is the map from one iteration to the next, so here transition and iteration count the same step.
If you’re running four Markov chains in parallel, each with 1000 main sampling iterations, then the physical time for all of the chains to finish the main sampling phase would be 7840 seconds if each transition needed only 10 leapfrog steps. If you set each parallel chain to 4000 main sampling interactions, or run four chains in serial, then it would take four times as long as you suggest.
Keep in mind, however, that this estimate does not include warmup which is often slower than the main sampling phase. In RStan and PyStan one sets the total number of iterations, not the iterations of each phase, so iter=4000
would actually correspond to 2000 warmup iterations and 2000 main sampling iterations unless warmup
was set to an explicit value. If the chains stuck on early iterations then the adaptation may be performing sub optimally.
The key limitation of this timing estimate is the assumption of only 10 leapfrog steps, which occurs when the target distribution is relatively well-behaved. If your target distribution is much more complicated than the dynamic Hamiltonian Monte Carlo sampler might need long trajectories and short step sizes, both of which result in more leapfrog steps per transition (each leapfrog step requiring one gradient evaluation).
If you’ve reached the main sampling phase but the transitions appear to be much longer than what is being estimated then it’s likely that your target distribution is much more complicated than what the timing estimate naively presumes.
2 Likes