Hi!
Triggered by a recent report we are considering to add a check_finite
to be applied to the output of every ODE RHS call. I was skeptical at this as it will probably add a lot of overhead since the ODE RHS gets called a lot of times. As an example benchmark to test the cost of this extra checking, I took the hornberg ODE example which is very stiff and is with 8 states. I ran this case a number of times with the bdf and the RK45 integrator with different settings:
- 3 replications per case
- running multiple instances in parallel of the same test case (1, 4 and 8)
- with and without checking (the difference between the two ODE RHS is a call to
stan::math::check_finite
) - the problem was run without any sensitivity such that AD won’t obfuscate the result
I did point 2 to have an idea on how concurrent chains interact with my 4-core laptop which has hyperthreading.
Attached is a graphical result of this run. In numbers I get an average 7% performance degradation with 4 cores concurrency and ~6% with 8 concurrent runs. For a single concurrent run I don’t get a measurable difference, although I should increase the sample size here, but I would be surprise to get a difference which is noticeable.
All in all, I would not like to slow down the ODE code for all users by up to 7% just for checking over and over (checking the initial state though is a good compromise as I find). Moreover, I think this example shows that checking does cost performance and we should consider to have a non-checking Stan mode which can be enabled by experts.
Best,
Sebastian
perf_checking_hornberg.pdf (7.9 KB)
perf_checking_hornberg.csv (5.9 KB)