Sampler parameters which the typical user might need to set (hmc_nuts_diag_e_adapt only)

In NYC this summer we talked at one point about simplifying the default sampling function (hmc_nuts_diag_e_adapt) in PyStan/httpstan/others(?) by fixing some of the sampler parameters at their default values (i.e., not allowing the typical user to set them). (Advanced users would have lots of alternatives.)

I recall there being a consensus that the typical user should not manipulate most of the parameters. I’m having trouble recalling which ones the typical user should potentially be allowed to change. Can someone help? I think @betanalpha had opinions about this.

For reference, here are the sampler parameters I’m thinking about:

  • mass matrix
  • stepsize
  • stepsize_jitter
  • max_depth
  • delta
  • gamma
  • kappa
  • t0
  • init_buffer
  • term_buffer
  • window
  • init_radius

The thinking was that some of these sampler parameters depend on each other – manipulating them correctly is generally not going to yield good results for the typical PyStan/Rstan/*Stan user.

Added 2019-11-05: Added init_radius to discussion.


stepwise_jitter should be deprecated regardless.

delta, gamma, kappa, and t0 configure the dual averaging of the integrator step size. The last three probably shouldn’t be touched by anyone not heavily versed in Nesterov optimization theory, but delta defines the target adaptation stat and would nominally be exposed in case of divergences. That said, it might be easier to avoid delta entirely and have users set the step size themselves (without the adaptation being able to change it).

mass matrix inv_metric, init_buffer, term_buffer, and window configure the adaptation of the inverse metric elements. It’s not that these shouldn’t be exposed to the user but rather that they should be exposed only in certain patterns. For example if a user wants to specify their own inverse metric components and not allow them to be changed then they might call something like adaptation=static_inv_metric or something where inv_metric is set and window is forced to zero at the same time.


Perfect. So just stepsize for beginners – and max_depth?

While I have everyone’s attention. What about init_radius? Does anyone use this? I always specify initial values directly. I might have used init_radius once in my life.



Should we get rid of them totally? Which params would a user need from a Stan model in order to restart a run?

@betanalpha I’m only thinking about hmc_nuts_diag_e_adapt at the moment (to use the Stan services function name). This is going to be what users use when they call the default sample function in PyStan 3.

Are you saying that the user should ignore delta and manually set stepsize with no stepsize adaptation? (This is hmc_nuts_diag_e right?)

Stepsize adaptation should largely be limited to defaults, with adapt_delta or another method to lower the initial fitted step size in the presence of divergences. max_depth should be accessible for those with particularly long trajectories.

Everytime I’ve used it I was dealing with what became clear in hindsight was a bad model, and I think that’s the circumstance most of the time its employees. That said I don’t see any reason to remove it as there are principled ways to use it based on prior choice and general scales in the problem.

As we talked about at the developer retreat this function is probably too overloaded for most users. Instead of having a suite of adaptation parameters that configure the adaptation routes we talked about breaking the main service route into different functions, each designed for a specified kind of adaptation. For example nominal, restart, init_metric, reduced_stepsize or the like.

No. By default they shouldn’t set anything; the question is rather what happens when divergences pop up and they have to modify the step size.

In the interest of simplicity (fewer possible arguments), one could specify initial values directly instead of using init_radius, right?

Ok. This sounds really good. In the interest of having a minimum viable PyStan 3, which of these shall I implement? Or should I put in the docs “if you encounter a problem with divergences, you need to use cmdstan or pystan 2”?

Instead of having them all as function arguments what if we allowed the user to specify a config file for the sample? So it would be like

my_mod.sample(blah, blah1, sampler_config="./my_settings.conf")

So then we reduce the args in the function while also letting people fiddle with things if they want

This isn’t a common technique—in Python libraries at least. A key
disadvantage is that it requires working with the file system, which can
vary dramatically depending on the OS. We also want to allow people
without file systems (say online Jupyter users) to be allowed to
configure things.

fwiw, here’s the current set of params for CmdStanPy’s sample method:

That’s fine was mostly throwing it out there, if the goal is just to shrink the signature some sort of dictionary config would probs do that as well

In my opinion a minimal viable interface would have nominal and a way to adjust the step size, through modifying adapt_delta or whatever. Modifying the metric by hand is definitely a more advanced feature.

One possibility that has been discussed is having a stateful object that initializes to the default configuration which can then be modified through mutator methods. This was the intention of the argument configuration validator class used in CmdStan, but it’s not clear how that would work with a C++ API being accessed by both Python and R clients.

I think we are both talking about a input config and output config with defaults in which case yes I agree

In my mindspace this feels like an R tmpfile and python pickle we point cmdstan to right?

Specifying initial values is seriously annoying. The radius (or r equiv) is very handy. I’m happy for information, but I prefer software that doesn’t take every opportunity to beat me over the head with a stick in case my model is not somebody’s idea of optimal.


The problem is that this introduces a file system intermediary which breaks the up the current flow of the interfaces, where everything is local to the interface environment.

How do you prevent typical users from setting a feature that an advanced user could set?

I find a need to lower this in 100-coefficient logistic regression simulations. Those can be really numerically unstable at wide initial values.

What kind of models do you have where that’s necessary? We almost never have to do that with the more recent versions of Stan.

That’s a huge burden for large models.

Does “nominal” just mean default here?

Obfuscation through documentation – they would just not be shown in the introductory treatments and examples. Many of the existing arguments have this flavor already.

Yes, “nominal” in the physics/engineering sense of default.