Thanks @vincent-picaud for looking into this. Your code is very elegant and I’ve obviously missed out on many things that can make my life easier. Your code is also possibly a bit incomprehensible to people who do not understand Mathematica. So I have to think about this tradeoff - to what extent do we need people to understand the “meta” code that generates the test values. Thanks a lot, there really is a lot of food for thought for me.
Also thanks @bgoodri for your view. I’ll first address the tone (as I perceived it) than the content. On first reading, your post read quite combative and my impression was a mix of “tests against Mathematica are worse than no tests” and “don’t write any tests until you learn how to do it properly”. I believe this was not your intention and at least partially an artifact of my state of mind at the point. I however wanted to highlight that this is how some people might read it and it might (and might not) be a useful data point for you.
For the content:
I am trying to approach this from a pragmatic viewpoint. I think developer efficiency in writing tests is really important. My interest in this has been sparked by the fact that many (most?) functions/codepaths in Stan math currently have no or only very weak tests for numerical precision.
IMHO you are making perfect the enemy of the good here. I totally agree that tests against precomputed values shouldn’t be the only tests we do. That’s why I also wrote some tests against identities for the lbeta PR. But in my experience using formulas and precomputed values is complementary. In my recent struggle with neg_binomial_2 and surrounding functions, both approaches let me to identify some of the issues.
For me the biggest advantage of precomputed values is that those tests are quite easy to create. Let’s say I fear that lbeta fails with one very large and one very small argument. So I precompute values for this case in high precision and check. Not bulletproof, but quick. I am sure a clever formula-based test would show this as well, but I can’t make up one quickly.
Another issue with formulas is that I always have to worry about the numerics of the formula itself.
And there are functions for which good formulas are hard to come by, for example the lgamma_stirling_diff function I wrote (defined as the difference between lgamma and it’s Stirling approximation). Here any tests actually involving lgamma are weak because the interesting use cases for lgamma_stirling_diff are when lgamma is large and we would need more digits to capture the desired precision. I believe you may come up with some clever way around this, but using precomputed values let me to have a reasonably strong test quickly.
Using Arb might well be a better way forward in the long run, but I personally lack the skill and patience to work with C/C++ more than I have to. The cloud Mathematica is a good enough tool for me in that it lets me iterate quickly (and I already use it to check my symbolic manipulations). If someone integrated Arb with the test code in Stan math, than it would be a big advantage for Arb as the tests wouldn’t need to rely on external code. But I don’t think that has been done yet.
You also seem to imply that Mathematica can provide wrong results in cases where Arb would work. That would be a good reason to prefer Arb over Mathematica. I tried to quickly Google for some evidence in this regard and didn’t find any.
The above really is just to clear my mind as I had some negative emotions about your initial reply. I am not sure this thread should be turned into a discussion about best practices for tests in Stan math. If you think such discussion is warranted, it might be better to start a new thread. I briefly considered starting new thread myself, but I am not sure you want to be dragged into such discussion at this point or that is useful topic to spend the dev attention on right now, so I’ll let you make the call :-) :evading-responsibility-emoji: