Using a fixed step size for HMC (or NUTS) and other sampler options

Is it possible to completely fix the stepsize__ parameter? Equivalently I tried to run the HMC sampler with warmup=0 but I got flat lines and zero-likelihood/negative poisson parameter warnings even when setting the initial parameters to the “true” values. Is that not allowed in Stan?

I was under the impression that with NUTS it may be necessary to have some optimization during the burn-in period, but for HMC only the step size required optimization, so it did not need anything else if a step size (as well as int_time was provided).

Additionally, since I’m asking about sampler options, is there one to suppress all warnings during sampling? verbose=False doesn’t seem to eliminate all messages, and for long runs I would like to see only the percent progress.

Thanks.

can anyone help with this, just yes or no on whether stepsize__ can be fixed all the way through or sampler run with warmup=0 ? Thanks in advance.

Sorry for the delayed response. This should be possible in cmdstan with the engine=static option. Is that not working for you?

Besides timestep, Stan adapts what is called its Euclidean metric during warmup. In an ideal case this should be something like the curvature of parameter space near where the posterior mass is. If this isn’t initialized, sampling can be super inefficient and fail unless the timestep is super small.

No problem. I’m surprised you guys are able to reply to nearly all questions, I was just afraid it would be lost at the bottom of the scrolling.

I am using PyStan, is that not feasible with that interface (or Julia, or even R, I’ve never used cmdstan, but would be comfortable with a few others if necessary). I am having trouble with parallel chains ending up with different parameters at optimization for large inference problem, that’s why I wanted to fix static HMC’s parameters from previous long runs. I forgot about the mass matrix, but I didn’t see its parameters on the sampler parameters, maybe I missed that.

Does this all make sense?
Thanks for the attention.

Aaah, if you want to set the mass matrix too then you will need to use cmdstan. I think it’s the only interface at this point that supports that unfortunately. Also you might need to use cmdstan to get the mass matrix from an already-adapted chain. I don’t think it’s available in Rstan and I’m not sure about Pystan.

Thanks, that’s very helpful. It would be some work to get all the python input adapted to cmdstand, but I may try to go for it if it turns out reducing the number of parameters to be tuned by switching to HMC doesn’t help. Thanks again.

cmdstan accepts input in JSON format now.

1 Like

to use JSON input for cmdstan, you’ll need to current (develop branch) of cmdstan and stan. if the data input file ends in .json, it will be parsed as such. ditto input files for specified initial mass matrix.

from the cmdstan manual:

Stan follows the JSON standard. A Stan input file in JSON notation consists of JSON object which contains zero or more name-value pairs. This structure corresponds to a Python data dictionary object. The following is an example of JSON data for the simple Bernoulli example model:

` { "N" : 10, "y" : [0,1,0,0,0,0,0,0,0,1] }`

Matrix data and multi-dimensional arrays are indexed in row-major order. For a Stan program which has data block

data { int d1;
       int d2;
       int d3;
       int ar[d1, d2, d3];
}

the following JSON input file would be valid

     { "d1" : 2,
       "d2" : 3,
       "d3" : 4,
       "ar" : [[[0,1,2,3], [4,5,6,7], [8,9,10,11]],
               [[12,13,14,15], [16,17,18,19], [20,21,22,23]]]
}

JSON ignores whitespace. In the above examples, the spaces and newlines are only used to improve readability and can be omitted.

All data inputs are encoded as name-value pairs. The following table provides more examples of JSON data. The left column contains a Stan data variable declara- tion and the right column contains valid JSON data inputs.

1 Like

May be worth the extra work to get additional control over the MCMC parameters. Thanks.

I think we only have the diagonal of the euclidian metric on PyStan… which might still be inside the “raw” holder object (fit.sim['samples'][i])

Also we still have not implemented the initialization with mass matrix.

The mass matrix should be one of the more straightforward algorithm parameters to tune, though, right? It could be estimated from the vector of parameter values q only (and its mean over the sampling phase \bar{q}) as the expectation over the chain of the outer product:

M^{-1} \approx \mathbb{E} [ (q-\bar{q}) \otimes (q-\bar{q}) ]

If that is the case I could recover it from the Pystan runs and use them as input in for a CmdStan run instead of replicating the PyStan with CmdStan.

This would not be a diagonal matrix, however. I’m assuming that is what the “dense” means for an Euclidean metric, and I’m not sure how to compute a diagonal matrix instead.

Is that correct and how it’s computed, and what is the standard for PyStan/CmdStan? Thanks.

Follow on question - is there a way to determine or set how many evaluation of log_prob occur with static HMC? Thanks in advance.

I figured out how to get a replicable number of log_prob evals across random seeds and data sizes , though I don’t know how to determine this number. You have to invoke static hmc with adapt engaged=0:
./<prog> sample algorithm=hmc engine=static adapt engaged=0 data file=...

Interestingly, setting int_time and stepsize to static values did not fix the number of log_prob evaluations.

What do you mean by number of log_prob evaluations? In principle the log posterior gets evaluated once per iteration, if I’m not forgetting something, it doesn’t depend on the Hamiltonian trajectory – in static HMC the number of steps is fixed, and the log posterior gets evaluated at the last point in the trajector, with NUTS the number of steps is determined by the geometry of the parameter space and whether it is “doubling back” into the beginning of the trajectory of the Hamiltonian system, but not the log posteriors at each of these points.

Not sure if I am missing something, or if you are talking about something different.

In HMC the density is evaluated once per leapfrog integrator step. The sampler parameter ‘n_leapfrog__’ (rstan and pystan should have accessor functions) tells you the number of evals per iteration. This is why small stepsizes kill performance faster than complex calculations. In my experience a usable model might be a million times more computational intensive than a well parametrized model from this perspective (stepsizes).

1 Like

Right. I was indeed missing the basic fact that the Hamiltonian is supposed to conserve energy and that the potential is dependent on position, so it needs to be evaluated at each step. It should, however, be a simple function of the number of time steps, right? So for a fixed trajectory length without optimization of step size (so fixed step size), the number should be constant.

@Sean_corp, I’m not sure what you are trying to get at. But a fixed integration time and step size should give a predictable number of evaluations.

This is not correct for adaptive HMC algorithms like the No U-Turn Sampler and its derivative (there doesn’t seem to be a good name), the default Stan algorithm for inference. You are correct that vanilla HMC has a fixed number of steps and log_prob evaluations per iteration, but Stan isn’t using that by default. If you want to see this for yourself, take a look at the generated .hpp model file (easiest using CmdStan), edit it to add a static int counter variable to the model class, initialize it to 0 out of line, increment it by 1 in each log_prob, and override the default destructor to print it out on destruction to std::cout. If you switch the algorithm used with the command I listed above (essentially setting engine=static adapt engaged=0 to turn off adaptive iterations) then you can get a fixed number per run with arbitrary seeds.

It does however report how many evaluations it does in the n_leapfrog__ parameter so you can track it.

1 Like

I think either you use static HMC and given a step size have a fixed number of evaluations, or you track what is dynamically determined by the No U-Turn criterion, like @sakrejda mentioned. Still not sure what is the problem you are trying to solve with this, but these seem to be the two options you have.

This is for an evaluation task that will be useful when comparing Stan’s compiler + Math library against other alternatives in an end-to-end algorithm-agnostic way. Thank you both :)