I have a question regarding the `stepsize`

and `stepsize_jitter`

parameters. I am using `rstan`

, but I guess that the other interfaces have similar parameters.

I would like to know how exactly the “jittering” is done, i.e. how the supplied stepsize is randomised. The parameter `stepsize_jitter`

is supposed to be in [0, 1] and I would guess that higher values randomise more. But what kind of randomisation takes place - is it uniform, gaussian, …?

Moreover: Is it possible to supply my own randomised timesteps, e.g. an array (length `iter`

) of timesteps to be used for the iterations? The specification for the parameter `stepsize`

says it’s `double`

, which makes me guess I cannot simply supply an array.

The docs for this are over here: 14.2 HMC Algorithm Parameters | Stan Reference Manual

The actual function is here: stan/base_hmc.hpp at develop · stan-dev/stan · GitHub

I don’t know of a way to do that without modifying Stan itself.

Stepsize jitter is strongly not recommended and will be deprecated in upcoming versions of Stan.

Here again, thanks for the quick answer @bbbales2!

Why is that @betanalpha? Will the stepsize be randomised another way or will it simply be static throughout the MC iterations?

The stepsize only changes during the adaptation stage (after each MCMC draw). During the sampling stage the stepsize stays the same. You should be able to use `get_sampler_params`

with `inc_warmup = TRUE`

in Rstan to see what’s actually being used.

Step size will vary during adaptation, and that history of step sizes can be recovered in RStan and PyStan as @bbbales2 notes (it’s included immediately in CmdStan so there you just need to save the warmup iterations). The step size in that adaptation phase is technically random, but only because the adaptation is conditioning on the random realization of the Markov chain.

Adding jitter to the step size will randomize it during the main sampling phase as well, which you can see in the same way by examining the non-warmup samples. Once this jitter is deprecated the step size will remain constant during the main sampling phase.

Jitter was originally introduced by Neal as a way of including occasionally small step sizes that might allow exploration of regions of high curvature. Unfortunately this requires being in the right place at the right time, and the probability that everything coincidently aligns is quite small. Indeed it ends up just mostly compromising performance away from the regions of high curvature where the fluctuations to smaller step sizes require most costly transitions.

There is another argument for varying step sizes that comes from a theoretical ergodicity perspective. Here the variation is not so much a question of performance but rather mathematical convenience for getting the numerical integrator to “smear out” and cover the entire parameter space (and even then its hard to decouple the varying step size from varying integration time it induces). This approach requires an exponential distribution of step sizes which not only allows for smaller step sizes but also much larger step sizes than nominal. Empirically this kind of variation does not seem to offer much in terms of performance or robustness, but that might not be unexpected given that it is aimed at static integration time methods and not the dynamic methods that we employ in Stan.

3 Likes

Whoops, my bad. I didn’t realize this was the case.