I think the ODE docs (v2.25), particularly Section 13.6, has some parts that could be improved or clarified to give users better information on how to use the solvers.
Documentation says "
absolute_tolerance control the stepsize by specifying a target local error that the solver tries to match …"
- It would be good to point out that the methods can only estimate their error (for example the RK45 does it by doing both a 4th and 5th order solve) and the true error is not known. Just to not give users a false sense of security.
- It would be good to show the actual inequality that needs to be satisfied,
as is done here.
It is also written: “Relative tolerances are relative to the solution value, whereas absolute tolerances is the maximum absolute error allowed in a solution”.
- Are you sure that the latter is true?
The sentence continues “… and
max_num_steps specifies the maximum number of steps the solver will take between output time points before throwing an error.”
- I think it’s misleading to talk about throwing an error, at least for cmdstan-2.25. When setting
max_num_stepsto a very low value like 10, I just get this now and then during warmup:
Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue: Chain 4 Exception: Exception: integrate_ode_rk45: Failed to integrate to next output time (1) in less than max_num_steps steps
so it is just rejecting a proposal, but all chains still finish. If
max_num_steps is even lower like 3, I get lot more of the above warning and in the end also
Warning: 2848 of 3000 (95.0%) transitions ended with a divergence
10 of 3000 (0.0%) transitions hit the maximum treedepth limit of 10 or 2^10-1 leapfrog steps.
Warning: Chain 4 finished unexpectedly!
So I guess they are marked as divergences if they occur after warmup. The documentation also
says “The maximum number of steps can be used to stop a runaway simulation.” but I don’t see
how it is stopping it.
- It would be good to give some reasoning on why the default control parameters are what they are, and in what case they can be OK. They most likely are not ok, if your solution will be on nanoscale, as was observed here.
The docs say “The tolerances should be small enough so that setting them lower does not change the statistical properties of posterior samples generated by the Stan program but large enough to avoid unnecessary computation.”
- I think this also kind of applies to
max_num_steps. Anyway, is there some way to actually check if the statistical properties have been changed and what amount of change is OK? Me, @bbbales2 and @avehtari have a case study + paper coming up on how to do this efficiently, but I’d like to know if some other way to do it already exists.