I’m not saying we have to go into this level of detail here but I’d like us to know when we switch out math functions why we do it so here’s what I did using trigamma as an exaample:
I wrote a quick test file (just appended below so the plots labels are obvious). It takes a really naive approach of doing an equally spaced sequence for both functions which is silly but good enough for this machine and a first shot. Downside is the output is 80Gb and I can only load sections into R for plotting at a time.
When looking at the output I think it’s good to focus on a few things:

setting, I haven’t looked up which version you wanted me to run (re: doublelongdouble conversion). I can do that in the next round.

having something to compare, ideally an implementation that we know the error for, etc… or just multiple implementations. I imagine boost’s implementation is better but I would like to grab reference values or values from another implementation. I know mentioned a C library that’s widely respected for special functions and I found it once but don’t have the link right now (@bgoodri if you know what I’m talking about I’d inlcude it in the comparison).

finding the regions with problems, for this run it looks like even in absolute error the error is approaching 2e9 as x approaches zero so maybe the branch where we accumulate 1/x^2 is a problem.

our implementation looks like it might have some easy problems to fix as we accumulate 1/x^2 starting from the smallest x valueit might be a problem because we start with the largest 1/x^2 values. So maybe just fixing that would be good. Here’s what we do:
z = x;
value = 0.0;
while (z < large) {
value += 1.0 / (z * z);
z += 1.0;
}
For example in R if you run that section of code for x = 0.001 you get this answer:
> sprintf("%10.20f", f(0.001))
[1] "1000001.64243319106753915548"
If you construct a vector of ‘value’ and then sum using R’s sum
function you get something different;
> sprintf("%10.20f", g(0.001))
[1] "1000001.64243319083470851183"
And if you sum from smallest to largest values you get what R gets:
> sprintf("%10.20f", h(0.001))
[1] "1000001.64243319083470851183"
or we could use a Kahan sum (https://en.wikipedia.org/wiki/Kahan_summation_algorithm#Further_enhancements) for things like this in general. It’s easy to implement and there are some modern versions.

I should add timings to that c++ code so we know what the perforance implications are of making a switch

This should be easier so I was going to take that test code and write a c++ template function stuff out so it’s easy to check any given function and produce the output required. Most of that test code is boilerplate.
Here’s the C++ test code:
#include <iostream>
#include <limits>
#include <stan/math/prim/scal.hpp>
#include <stan/math/prim/scal/fun/trigamma.hpp>
#include <boost/math/special_functions/trigamma.hpp>
int main (int argc, char** argv) {
double start = 0;
double stop = 1e6; std::numeric_limits<double>::max();
int n_steps = 1e9;
double step = (stop  start) / double(n_steps);
double x = start;
double stan_trg;
double boost_trg;
double stan_boost_trg;
std::cout << "x, stan::math::trigamma(x), boost::math::trigamma(x), difference::abs";
std::cout << ", difference::rel \n";
while (x >= start && x < stop) {
try {
stan_trg = stan::math::trigamma_impl(x);
} catch (const std::exception&) {
stan_trg = std::numeric_limits<double>::quiet_NaN();
}
try {
boost_trg = boost::math::trigamma(x);
} catch (const std::exception&) {
boost_trg = std::numeric_limits<double>::quiet_NaN();
}
try {
stan_boost_trg = stan::math::trigamma_impl(x)  boost::math::trigamma(x);
} catch (const std::exception&) {
stan_boost_trg = std::numeric_limits<double>::quiet_NaN();
}
std::cout << std::setprecision(std::numeric_limits<double>::digits10) << x << ", ";
std::cout << std::setprecision(std::numeric_limits<double>::digits10) << stan_trg << ", "
std::cout << std::setprecision(std::numeric_limits<double>::digits10) << boost_trg << ",
std::cout << std::setprecision(std::numeric_limits<double>::digits10) << stan_boost_trg <
std::cout << std::setprecision(std::numeric_limits<double>::digits10) << stan_boost_trg /
x = x + step;
}
return 0;
}
Compiles on ‘develop’ with:
g++ I . I ./lib/eigen_3.3.3 I ./lib/boost_1.69.0 o switchtest ./switchtest.cpp
The .csv (as written, easy to modify in the C++ code) is 80Gb of text and you can load sections in R (‘skip’ controls how many ros are skipped, ‘nrow’ controls how many rows to load) with:
d = read.csv('test.csv', header = TRUE, colClasses = 'numeric', check.names =FALSE, nrows=10, skip=10)
I’m going to be out for a while but I’ll see if I can’t write some more sensible code this afternoon for including timing.
In terms of what evidence I think we would want: a multifunction comparison that shows that whatever version we pick has low error and doesn’t tank performance.
@wds15 can you easily come up with a Stan line that kicks off this calculation in derivatives? I would throw together a test model that focuses on a trigamma region with problems to see if it matters.
@bbales: where in trigamma did you see problems?