In NYC this summer we talked at one point about simplifying the default sampling function (hmc_nuts_diag_e_adapt) in PyStan/httpstan/others(?) by fixing some of the sampler parameters at their default values (i.e., not allowing the typical user to set them). (Advanced users would have lots of alternatives.)
I recall there being a consensus that the typical user should not manipulate most of the parameters. I’m having trouble recalling which ones the typical user should potentially be allowed to change. Can someone help? I think @betanalpha had opinions about this.
For reference, here are the sampler parameters I’m thinking about:
mass matrix
stepsize
stepsize_jitter
max_depth
delta
gamma
kappa
t0
init_buffer
term_buffer
window
init_radius
The thinking was that some of these sampler parameters depend on each other – manipulating them correctly is generally not going to yield good results for the typical PyStan/Rstan/*Stan user.
Added 2019-11-05: Added init_radius to discussion.
delta, gamma, kappa, and t0 configure the dual averaging of the integrator step size. The last three probably shouldn’t be touched by anyone not heavily versed in Nesterov optimization theory, but delta defines the target adaptation stat and would nominally be exposed in case of divergences. That said, it might be easier to avoid delta entirely and have users set the step size themselves (without the adaptation being able to change it).
mass matrixinv_metric, init_buffer, term_buffer, and window configure the adaptation of the inverse metric elements. It’s not that these shouldn’t be exposed to the user but rather that they should be exposed only in certain patterns. For example if a user wants to specify their own inverse metric components and not allow them to be changed then they might call something like adaptation=static_inv_metric or something where inv_metric is set and window is forced to zero at the same time.
Perfect. So just stepsize for beginners – and max_depth?
While I have everyone’s attention. What about init_radius? Does anyone use this? I always specify initial values directly. I might have used init_radius once in my life.
@betanalpha I’m only thinking about hmc_nuts_diag_e_adapt at the moment (to use the Stan services function name). This is going to be what users use when they call the default sample function in PyStan 3.
Are you saying that the user should ignore delta and manually set stepsize with no stepsize adaptation? (This is hmc_nuts_diag_e right?)
Stepsize adaptation should largely be limited to defaults, with adapt_delta or another method to lower the initial fitted step size in the presence of divergences. max_depth should be accessible for those with particularly long trajectories.
Everytime I’ve used it I was dealing with what became clear in hindsight was a bad model, and I think that’s the circumstance most of the time its employees. That said I don’t see any reason to remove it as there are principled ways to use it based on prior choice and general scales in the problem.
As we talked about at the developer retreat this function is probably too overloaded for most users. Instead of having a suite of adaptation parameters that configure the adaptation routes we talked about breaking the main service route into different functions, each designed for a specified kind of adaptation. For example nominal, restart, init_metric, reduced_stepsize or the like.
No. By default they shouldn’t set anything; the question is rather what happens when divergences pop up and they have to modify the step size.
Ok. This sounds really good. In the interest of having a minimum viable PyStan 3, which of these shall I implement? Or should I put in the docs “if you encounter a problem with divergences, you need to use cmdstan or pystan 2”?
This isn’t a common technique—in Python libraries at least. A key
disadvantage is that it requires working with the file system, which can
vary dramatically depending on the OS. We also want to allow people
without file systems (say online Jupyter users) to be allowed to
configure things.
In my opinion a minimal viable interface would have nominal and a way to adjust the step size, through modifying adapt_delta or whatever. Modifying the metric by hand is definitely a more advanced feature.
One possibility that has been discussed is having a stateful object that initializes to the default configuration which can then be modified through mutator methods. This was the intention of the argument configuration validator class used in CmdStan, but it’s not clear how that would work with a C++ API being accessed by both Python and R clients.
Specifying initial values is seriously annoying. The radius (or r equiv) is very handy. I’m happy for information, but I prefer software that doesn’t take every opportunity to beat me over the head with a stick in case my model is not somebody’s idea of optimal.
The problem is that this introduces a file system intermediary which breaks the up the current flow of the interfaces, where everything is local to the interface environment.
Obfuscation through documentation – they would just not be shown in the introductory treatments and examples. Many of the existing arguments have this flavor already.
Yes, “nominal” in the physics/engineering sense of default.