Relationship between stan::math and boost::math

In the trigamma thread @wds15 said (added parentheticals in square brackets for clarity):

Unless you say that boost is terrible, we should just switch [to boost]. Boost math folks actively maintain this function. So it’s better to let them maintain and test it. If it’s bad then we should file an issue to them. The future maintenance burden is with them which is great for us. That’s why I would switch without thinking at this point (one more look at performance would be good).

I agree with him about this in general (we use boost so heavily there’s no reason for us to duplicate their functionality, especially since they take performance and accuracy seriously). In practice I think this means:

  1. When boost functions are close in terms of error and performance we should just rely on their functions

  2. When we want to deviate but the issue could be addressed via pull request we should check if they’re willing to make modifications

  3. For already-implemented functions like the trigamma I think we should demonstrate to ourselves (so some sort of minor doc, maybe in the Wiki) that we’re making the change for a reason (similar performance and less code is a fine reason!)

For derivatives this isn’t going to be all that helpful since we’re not about to change to their autodiff and they don’t have many of the derivatives we need. We can always back out of relying on boost::math if they go in a different direction for some reason.

That does change my perspective on the stan math library a little. The core math stuff that we often rely on boost::math for is fun but it’s not really a good niche to compete in (not much reason why) but in auto-diff and the functions needed to support that we could do very well and differentiate at the same time.

@syclik is that similar to how you’re currently thinking about the relationship between boost and stan math libs? I’d tag more of the math lib devs but I’ve missed the recent meetings so I’m not sure who to include.


To put things into perspective for their turnaround time: the overflow bug in lgamma was fixed within 3h after reporting! That may not generalize, but it shows their effort they put into this.

For derivatives…you are right that things don’t generalized so easy. However, sine boost 1.71 the boost math supports the forward mode of boost. So we can always resort to that as a last option.

(Specifically for the lgamma, boost math has the polygamma function which is the nth derivative of the lgamma)

This is quite true, for example, with the hypergeometric functions that were merged into Boost Math after 1.70 was frozen. We have our own versions for some derivative calculations, and maybe we have done some additional checking for those cases, but in general, Boost has put in a lot more effort to make sure the hypergeometric functions are as accurate as possible.

@wds15 @bgoodri yeah, I agree in general, (and for the trigamma case definitely). When we make changes I still think it makes sense to doc them so we don’t screw up somebody’s models without notification.

Do we need to revise any other functions that previously used boost::math and were replaced?

A quick rundown on what was replaced:

  • asinh
  • cbrt
  • erf
  • erfc
  • expm1
  • isinf
  • isnan
  • round
  • trunc
  • acosh
  • atanh
  • lgamma (a PR from @wds15 is open & ready)
  • log1p
  • tgamma

I agree on all three points. For 1. it would be just great if we can come to an agreement on what does close mean in terms of performance. Is it 2%, 5%, 10%? And what is the performance hit when we decide to branch on OS specific implementations (on windows use X because its faster, on Linux use Y, etc.). Just to prevent 100+ comment PRs for seemingly simple changes like

Its probably a similar discussion for what close means in terms of precision.

This is what I believe we were thinking about. The preference is in this order:

  1. Standard template library (STL)
  2. Boost Math
  3. CMath or other builtin C / C++ library that’s portable
  4. custom implementation

We would use something with a lower preference over a higher preference if there’s a good reason for it. For example, we write our own normal distribution function because it’s not available in STL, it is available in Boost but our implementation has specialization for reverse-mode autodiff. If that last condition no longer holds, we should just use Boost and not maintain our own custom implementation.

+1. I’d rather solve this with a policy rather than have to evaluate case-by-case.

Up until now, the unofficial criteria has been a strict 0% performance regression. I believe this is unreasonable, but I think if there are cases when there is 0% performance regression across everything we care about, we can move forward (from the performance evaluation aspect).

In addition to that, we expect there to be no side-effects from changes in code. If there are side-effects, then it has to be clearly stated, there has to be a good reason, or we expect the implementation to change to reduce side-effects. I don’t think that policy needs to change.

The change from boost::lgamma to std::lgamma did not have a performance regression at the time: we were not checking for threaded performance. Once we switched to std::lgamma, reverting to boost::lgamma was problematic because side-effects were discovered. There were numerical precision issues.

Re: preventing 100+ comment PRs. One step is to treat the reporting of side-effects and regressions seriously. The other step we should take is to have guidelines on when we would consider a speed regression for a different implementation. (It’s hard because it’s always going to be a tradeoff and making a generic rule may not be great.) But… for this PR, if it was stated that it had a performance regression for single-core usage + there was a side-effect of having numerical issues in other functions requiring changes of tests, then we could have just focused on that and if/why that’s necessary rather than discovering it midway and then having an unstructured discussion about these things.

Personally, I’d prefer if we worked in that manner: as a PR proposer, if you discover side-effects, you state them. If they are discovered in the process, they are summarized and the PR is reevaluated with that knowledge. For anything without side-effects and performance regressions, that part of the conversation is just skipped.

Only if we find a reason to move back. It would be good to start having a process that would enable us to evaluate whether the conditions for why we’ve made exceptions to the priority of STL > Boost > CMath > custom still hold. We haven’t been good about that.

Most of the time, we just add new functions that don’t have counterparts anywhere else.