Sampler parameters which the typical user might need to set (hmc_nuts_diag_e_adapt only)

In NYC this summer we talked at one point about simplifying the default sampling function (hmc_nuts_diag_e_adapt) in PyStan/httpstan/others(?) by fixing some of the sampler parameters at their default values (i.e., not allowing the typical user to set them). (Advanced users would have lots of alternatives.)

I recall there being a consensus that the typical user should not manipulate most of the parameters. I’m having trouble recalling which ones the typical user should potentially be allowed to change. Can someone help? I think @betanalpha had opinions about this.

For reference, here are the sampler parameters I’m thinking about:

  • mass matrix
  • stepsize
  • stepsize_jitter
  • max_depth
  • delta
  • gamma
  • kappa
  • t0
  • init_buffer
  • term_buffer
  • window
  • init_radius

The thinking was that some of these sampler parameters depend on each other – manipulating them correctly is generally not going to yield good results for the typical PyStan/Rstan/*Stan user.

Added 2019-11-05: Added init_radius to discussion.

3 Likes

stepwise_jitter should be deprecated regardless.

delta, gamma, kappa, and t0 configure the dual averaging of the integrator step size. The last three probably shouldn’t be touched by anyone not heavily versed in Nesterov optimization theory, but delta defines the target adaptation stat and would nominally be exposed in case of divergences. That said, it might be easier to avoid delta entirely and have users set the step size themselves (without the adaptation being able to change it).

mass matrix inv_metric, init_buffer, term_buffer, and window configure the adaptation of the inverse metric elements. It’s not that these shouldn’t be exposed to the user but rather that they should be exposed only in certain patterns. For example if a user wants to specify their own inverse metric components and not allow them to be changed then they might call something like adaptation=static_inv_metric or something where inv_metric is set and window is forced to zero at the same time.

2 Likes

Perfect. So just stepsize for beginners – and max_depth?

While I have everyone’s attention. What about init_radius? Does anyone use this? I always specify initial values directly. I might have used init_radius once in my life.

Sometimes

5 Likes

Should we get rid of them totally? Which params would a user need from a Stan model in order to restart a run?

@betanalpha I’m only thinking about hmc_nuts_diag_e_adapt at the moment (to use the Stan services function name). This is going to be what users use when they call the default sample function in PyStan 3.

Are you saying that the user should ignore delta and manually set stepsize with no stepsize adaptation? (This is hmc_nuts_diag_e right?)

Stepsize adaptation should largely be limited to defaults, with adapt_delta or another method to lower the initial fitted step size in the presence of divergences. max_depth should be accessible for those with particularly long trajectories.

Everytime I’ve used it I was dealing with what became clear in hindsight was a bad model, and I think that’s the circumstance most of the time its employees. That said I don’t see any reason to remove it as there are principled ways to use it based on prior choice and general scales in the problem.

As we talked about at the developer retreat this function is probably too overloaded for most users. Instead of having a suite of adaptation parameters that configure the adaptation routes we talked about breaking the main service route into different functions, each designed for a specified kind of adaptation. For example nominal, restart, init_metric, reduced_stepsize or the like.

No. By default they shouldn’t set anything; the question is rather what happens when divergences pop up and they have to modify the step size.

In the interest of simplicity (fewer possible arguments), one could specify initial values directly instead of using init_radius, right?

Ok. This sounds really good. In the interest of having a minimum viable PyStan 3, which of these shall I implement? Or should I put in the docs “if you encounter a problem with divergences, you need to use cmdstan or pystan 2”?

Instead of having them all as function arguments what if we allowed the user to specify a config file for the sample? So it would be like

my_mod.sample(blah, blah1, sampler_config="./my_settings.conf")

So then we reduce the args in the function while also letting people fiddle with things if they want

This isn’t a common technique—in Python libraries at least. A key
disadvantage is that it requires working with the file system, which can
vary dramatically depending on the OS. We also want to allow people
without file systems (say online Jupyter users) to be allowed to
configure things.

fwiw, here’s the current set of params for CmdStanPy’s sample method:

https://cmdstanpy.readthedocs.io/en/latest/api.html#cmdstanpy.CmdStanModel.sample

That’s fine was mostly throwing it out there, if the goal is just to shrink the signature some sort of dictionary config would probs do that as well