I was wondering if we do have some understanding about limitations in the accuracy of the derivatives and resulting impact on the performance of HMC? What got me thinking is this:
Essentially, the ODE integrator at low precisions killed the performance of HMC. Increasing the precision of the ODE integrator gave nice results in reasonable time. Hence, I am wondering about the general question raised and if it may pay off to add an integrator which can calculate fast with high precision the solution to the ODEs (odeint has one of those in its shop as I recall).
I don’t understand the question here. Don’t we always want something
that is fast and precise? I don’t think there’ll be a general answer
about precision necessary for HMC—it will depend on the geometry of
the posterior. But understanding this tradeoff is important and should
be discussed in the doc and tutorials.
I was expecting that the answer is “depends on the problem”, but was hoping for more. As stan::math calculates the derivatives with finite precision for some of those complicated functions, I was wondering if it is known that the limited precision of the gradient was limiting the performance of HMC. For example, I remember that some funky beta functions were optimized for higher precision recently.
High precision comes at the cost of speed, usually. In the case of ODEs this means to choose the right integrator. So the RK45 non-stiff integrator is very fast for low to medium accuracy according to what I find on the net (10^-3 to 10^-8 abs+rel). If you want more precise solutions, then other integrators outperform RK45. The Bulrisch-Stör (BS) is part of odeint and is known to be very fast for high-accuracy ODE solutions (below 10^-8). The BS may also work on stiff problems as I learnt recently. Adding the BS would be trivial as it is part of odeint, but before trying this out I want to get a feeling if this is worth the trouble (I know everyone will sigh - another ODE solver). I remember having done some test runs a while ago comparing RK45 to BS and recall that BS was indeed a lot faster at those high precisions. Back then I never thought that 10^-10 abs+rel precision is something ever needed, but I could be wrong.
I don’t know the answer. But if you can validate with real problems
that the BS integrator is faster (while remaining as robust), then we
can add it. The point of getting the back-end abstraction right is to make
these things easy to add. Having said that, I don’t just want to include
integrators for a sense of completeness—they have to justify the code,
doc, tests, and ongoing support they’ll need.
Great. We are on the same page here… I will really only suggest to add this if I see a good use case which I can nurture with a good benchmark against what we have.
… I thought Michael would have some thoughts on this as he had (or better has, I think) big concerns about ODE accuracy which could be solved with BS.