Currently, the regularization constant pulling the metric back towards the identity is hard coded as 1e-3 * (5.0 / (n + 5.0)) where n is the size of the current metric adaptation window (stan/covar_adaptation.hpp at develop · stan-dev/stan · GitHub). Would it be possible to change this, or make this modifiable?
Or, alternatively, is it possible to let stan print the unconstrained values instead of the constrained values to the output csv?
What’s the motivation for the change? It can be changed in the source easily but changes can have very large unintended effects.
In CmdStan the unconstrained values are stored in a file specified by the output diagnostic_file option. This functionality should be available in all of the CmdStan wrappers and it also available in RStan with the rarely used diagnostic_file argument.
Keep in mind that the column names are not changed, so if one has real<lower=0> x then the column for the constrained and unconstrained values will be both x (because of how the model and samplers are abstracted there’s now way to determine which transformation is being applied, and hence a more principled naming convention). Similarly multivariate constrained types with lower-dimensional unconstrained spaces, such as simplices and covariance matrices, will have fewer elements in the diagnostic_file output.
It looks like for chaotic ODE systems, the diagonal entries/eigenvalues of the covariance matrix can very quickly decay from unit scale on the prior to the order of 1e-6 or less. Note that I don’t want to change the default behavior, I only would like to know how different regularization constants would affect the behavior of the sampler/warmup.
I am fairly confident that this is not a problem with the metric adaptation but rather the warmup exploration itself, and hence changing the metric adaptation configuration won’t help.
Singular covariance function estimates arise when the sampler freezes. For dynamical Hamiltonian Monte Carlo this can happen due to biased gradient evaluations (the state probabilities quickly decay from the initial point so that even if a long trajectory is constructed only points near the initial point are ever sampled) or poor initial adaptation (the step size quickly drops to zero and the numerical trajectories end up confined near the initial point because the max treedepth limit). These are often related, with the small step size early in warmup caused by numerical problems in evaluating the target density or its gradient.
No amount of fussing with the metric will fix this; one has to instead work out why the sampler is freezing.
I’m fairly confident in the correctness of the covariance estimate, just as I do not doubt the inadequacy of the regular warmup procedure for even moderately difficult chaotic ode problems. Still, I’m curious how the regularizatiom constant influences the behavior.
Note that due to the “sensitive dependence on initial conditions” (and on the ODE parameters), a large part of the parameter space very quickly becomes incompatible with the (in silico) “observations”.
(The simplest non-boring chaotic ODE problem should be the Lorenz system with not-super-tight priors.)
Chaotic ODEs have bigger problems. If the base system is chaotic then the forward sensitivity system (and eventually the backward adjoint system) will also be chaotic, and numerical solves will quickly decouple from each other. In other words the numerically-solved sensitives will no longer correspond to sensitivities of the numerically-solved states and the overall target density will no longer be related to the gradients of the target density. This is where the entire numerical correction scheme in Stan’s dynamic Hamiltonian Monte Carlo breaks down and adaptation is irrelevant.