Do we need more unit tests for distributions?

Recently a number of both subtle and not-so-subtle issues were reported (partly by me) for multiple probability distributions:

https://github.com/stan-dev/math/issues/1495, https://github.com/stan-dev/math/issues/1496, https://github.com/stan-dev/math/issues/1498, https://github.com/stan-dev/math/issues/1531

This involves even the neg_binomial_2, which I thought is very “basic” and is presumably used in thousands of deployed models. Now most of the issues I’ve seen would rarely completely invalidate a model, but I can see some of them causing divergences here and there (many of the issues could manifest as large discontinuities in density for extreme parameters).

This might indicate we need more/better unit tests for distributions. As an example, when fixing the initial issue I found in neg_binomial_2, I discovered 4 additional issues and wrote or overhauled 7 tests to achieve some confidence that all code paths work (see https://github.com/stan-dev/math/pull/1497 for more details).

We rarely test the distribution for anything but “normal” values of parameters - extremes (large, close to 0, …) are tested at most for not throwing anything or not returning nonsense, but not for accuracy. Many tests actually involve only one parameter combination. Some distributions (e.g. inv. gamma) have no tests for derivatives.

Some relatively simple ways to strengthen the tests are IMHO:

  • Iterate over a grid of parameter values spanning several order of magnitudes for all extant tests
  • Compute values for both the result and the derivatives in Mathematica and test against those
  • Whenever a discrete choice is made (as when delegating from Neg. binomial to Poisson), test continuity at the boundary
  • Test derivatives from autodiff against derivatives computed with complex step.

I’ve tried to build a few tests for some distributions that are less “obvious”, and all of them pass, which is nice (see https://github.com/stan-dev/math/pull/1575). So maybe neg_binomial_2 was an outlier with unexpectedly tricky implementation while everything else is mostly OK?

I also understand that adding the tests might be a big effort and not necessarily worth it, hence this is very much up to discussion.

6 Likes

OK, so I tweaked a bit the tests for neg_binomial_lpmf and I am quite convinced it too has issues with derivatives for extreme values: https://github.com/stan-dev/math/issues/1578

3 Likes

Maybe this could be one topic that GSoC (Google Summer of Code) student could do, at least part of the work.

1 Like

I agree that we need to test the distributions better, including for derivatives.

I think the best course forward is to expand the finite differences test package. We can test against known derivatives in some cases, but it’s combinatorially prohibitive to do it by hand for first, second, and third-order derivatives using all of our autodiff options.

We need to think more about the possible accuracies we can get. It’s challenging to test over several orders of magnitude.

We won’t know until we test :-)

Very much worth it. I’d make testing a higher priority, but it’s hard to get funding for testing and hard to find people that want to do it.

Sure—anyone could do this. Is anyone applying for a GSoC student?

2 Likes

Stan project could apply one under NumFocus umbrella.

https://summerofcode.withgoogle.com/

I know Stan tried this once before, but didn’t get any students, but maybe this year it could work.

Last summer ArviZ had 2 students (I was mentoring one) and the experience was positive.

5 Likes

In my work with the tests for neg. binomial, testing against finite differences was very hard to setup to both provide strong test while not producing false positives. Using boost::math::differentiation::complex_step_derivative allowed me to keep the relative acuraccy for the test at 1e-8 (except for few extreme cases) so I would suggest moving to complex step whenever possible. The downside is that this requires the densities to work with std::complex<double> substituted for all differentiable arguments. This may not work for some functions - notably lgamma(std::complex<double>) is not available, but workarounds are relatively easy to write.

We can. Someone told me that they put in a project to NumFOCUS but NumFOCUS dropped the ball and never listed it. I’m not 100% sure what the story was. So if you do go this way, make double sure it gets listed!

How do we know when it’s possible? A lot of our distributions use lgamma or even more complicated functions.

Can you explain more?

Sorry, I was unnecessarily mysterious.

There is a bunch of functions that don’t admit std::complex<double> as argument by default, but I’ve succesfully made them work enough to compute complex step derivative of functions involving them:

  • log1p, expm1 etc. those have quite short code and usually fail due to lines like if(x < 1e-4) { ... }, providing a complex specialization often means just copying the code and replacing comparison of values with comparison of absolute value (e.g. if(abs(x) < 1e-4) { ... }) - the underlying approximations motivating many of the implementations are usually valid in complex plane as well.
  • lgamma() and other complex functions can be approximated via Taylor expansion around x.imag() == 0. This might not work when the argument to the function is already heavily transformed (e.g. lgamma(exp(15 * x)), but when the argument is “close” to the original (e.g. lgamma(x + 1)), the imaginary part will be the small “complex step” and the approximation holds. For lgamma this gives:
auto lgamma_c_approx = [](const std::complex<double>& x) {
  return std::complex<double>(lgamma(x.real()),
                              x.imag() * boost::math::digamma(x.real()));
  };

There are two possible approaches to computing complex step derivative with such custom implementations:

  • build specializations of necessary library functions admitting std::complex, making the current implementation of the function work with complex step. This could be a test-only library as we probably don’t want to invest in vouching for 100% correctness of the implementations. On the other hand, as there is work on std::complex<var>, such functions might be necessary anyway. For lgamma (and possibly others) there are complex implementations we could easily use, e.g. the one on Wikipedia
  • write a specialized test-only version of the function to be tested (or it’s simplest formula) using specialized functions - this is what I currently do for the neg_binomial_2_lpmf test against complex step

Hope that is more specific and actionable.

1 Like

Thanks! Yes, that’s clear.

I have a branch in stan-math that’s going to fold complex number support into Stan math for autodiff vars and matrices.

It’d be great if we could start expanding our library to handle complex inputs, even if we just have to define everything templated to start and autodiff through it. The branch above will provide support for autodiff for every function defined for std::complex in the standard library.

1 Like