I was wondering if it is at all possible to attempt to use the information obtained from the warmup in a Stan run for the ODE solver tolerances? I mean, the mass matrix is the scale of our parameters which should be very valuable information for setting relative precision targets for the ODE solver.
One would have to start with conservative guesses, but at some point it would make sense to take advantage of the known scales. My fear is that this is impossible to do as it is certainly an unorthodox usage of this piece of information.
How would you connect the scale of parameters in the mass matrix to the scale of the solutions of the ODE? The solutions are highly non-linear functions of the parameters and there may be tricks like
1000 * theta going on when a parameter
theta is used, meaning the scale of use doesn’t match the scale of declaration.
My goal is to tune the ODE integrator such that we calculate the right precision. Any higher precision is a waste and any lower as well. Since the ODE integration numerical cost apparently dominates ODE Stan programs, this can give us better speed and I think it hasn’t been explored a lot yet.
I would like to use the step size and mass for a parameter in order to set the absolute tolerances for the sensitivities of the states. The cvodes user guide states that:
The selection of absolute tolerances for the sensitivity variables is based on
the observation that the sensitivity vector si will have units of [y]/[pi]. With this,
the absolute tolerance for the j-th component of the sensitivity vector si is set
to atol_j/|bar(p)i|, where atol_j are the absolute tolerances for the state variables and
bar(p) is a vector of scaling factors that are dimensionally consistent with the model
parameters p and give an indication of their order of magnitude.
I think this is worthwhile to explore (currently the bar§ is set to the default 1, no scaling). However, I realize that it is way to early to think about putting anything into Stan at this point I should experiment a bit. I was wondering if I wanted to embark on this in Stan, if I could.
This stuff has, of course, the usual chicken-egg problem just as warmup (the scalings can be set well once we have a good knowledge on them which you have after having integrated the system a few times).
You’re missing Bob’s point – the scales learned in adaptation are the marginal posterior variances, which have nothing to do with the scales of the ODES system which are necessary for tuning the numerical solver.
Hmm… a bit against my intuition, but I suppose you are right and one would need either the Jacobian at the typical values or even better a sample of the integrated function values themselves.
Still, I think the general idea to use parts of the warmup for tuning ODE tolerances is worthwhile to explore - but as said, I will do so one my own and if this should ever land in Stan is yet another thing.
The step size is already adapted as part of the algorithm, right? Initializing better might help.
In terms of automatically adapting the tolerances, the only way I see to do that is to start with very high tolerance, then see if you can lower it and produce the same answers.
Yup, this is the thought I had… start conservative and then relax. However, for now I will attempt a simpler approach which is to sample from the prior a few draws (which should span a huge range of parameters) and then integrate the system and use the function values of the integrated function to find conservative tolerances.
BTW, the shifting operation applied whenever initials vary does change the meaning of the supplied absolute and relative tolerances in a way which is inconsistent to the case when only the parameters vary, but not the initials. This makes these choices a bit harder.