Do we need more unit tests for distributions?

martinmodrak · January 2, 2020, 4:26pm

Recently a number of both subtle and not-so-subtle issues were reported (partly by me) for multiple probability distributions:

https://github.com/stan-dev/math/issues/1495, https://github.com/stan-dev/math/issues/1496, https://github.com/stan-dev/math/issues/1498, https://github.com/stan-dev/math/issues/1531

This involves even the neg_binomial_2, which I thought is very “basic” and is presumably used in thousands of deployed models. Now most of the issues I’ve seen would rarely completely invalidate a model, but I can see some of them causing divergences here and there (many of the issues could manifest as large discontinuities in density for extreme parameters).

This might indicate we need more/better unit tests for distributions. As an example, when fixing the initial issue I found in neg_binomial_2, I discovered 4 additional issues and wrote or overhauled 7 tests to achieve some confidence that all code paths work (see https://github.com/stan-dev/math/pull/1497 for more details).

We rarely test the distribution for anything but “normal” values of parameters - extremes (large, close to 0, …) are tested at most for not throwing anything or not returning nonsense, but not for accuracy. Many tests actually involve only one parameter combination. Some distributions (e.g. inv. gamma) have no tests for derivatives.

Some relatively simple ways to strengthen the tests are IMHO:

Iterate over a grid of parameter values spanning several order of magnitudes for all extant tests
Compute values for both the result and the derivatives in Mathematica and test against those
Whenever a discrete choice is made (as when delegating from Neg. binomial to Poisson), test continuity at the boundary
Test derivatives from autodiff against derivatives computed with complex step.

I’ve tried to build a few tests for some distributions that are less “obvious”, and all of them pass, which is nice (see https://github.com/stan-dev/math/pull/1575). So maybe neg_binomial_2 was an outlier with unexpectedly tricky implementation while everything else is mostly OK?

I also understand that adding the tests might be a big effort and not necessarily worth it, hence this is very much up to discussion.

martinmodrak · January 3, 2020, 9:17am

OK, so I tweaked a bit the tests for neg_binomial_lpmf and I am quite convinced it too has issues with derivatives for extreme values: https://github.com/stan-dev/math/issues/1578

ahartikainen · January 3, 2020, 9:09pm

Maybe this could be one topic that GSoC (Google Summer of Code) student could do, at least part of the work.

Bob_Carpenter · January 4, 2020, 11:36pm

I agree that we need to test the distributions better, including for derivatives.

I think the best course forward is to expand the finite differences test package. We can test against known derivatives in some cases, but it’s combinatorially prohibitive to do it by hand for first, second, and third-order derivatives using all of our autodiff options.

We need to think more about the possible accuracies we can get. It’s challenging to test over several orders of magnitude.

We won’t know until we test :-)

Very much worth it. I’d make testing a higher priority, but it’s hard to get funding for testing and hard to find people that want to do it.

Sure—anyone could do this. Is anyone applying for a GSoC student?

ahartikainen · January 5, 2020, 12:09am

Stan project could apply one under NumFocus umbrella.

I know Stan tried this once before, but didn’t get any students, but maybe this year it could work.

Last summer ArviZ had 2 students (I was mentoring one) and the experience was positive.

martinmodrak · January 6, 2020, 12:13pm

In my work with the tests for neg. binomial, testing against finite differences was very hard to setup to both provide strong test while not producing false positives. Using boost::math::differentiation::complex_step_derivative allowed me to keep the relative acuraccy for the test at 1e-8 (except for few extreme cases) so I would suggest moving to complex step whenever possible. The downside is that this requires the densities to work with std::complex<double> substituted for all differentiable arguments. This may not work for some functions - notably lgamma(std::complex<double>) is not available, but workarounds are relatively easy to write.

Bob_Carpenter · January 15, 2020, 2:01am

We can. Someone told me that they put in a project to NumFOCUS but NumFOCUS dropped the ball and never listed it. I’m not 100% sure what the story was. So if you do go this way, make double sure it gets listed!

How do we know when it’s possible? A lot of our distributions use lgamma or even more complicated functions.

Can you explain more?

martinmodrak · January 15, 2020, 1:16pm

Sorry, I was unnecessarily mysterious.

There is a bunch of functions that don’t admit std::complex<double> as argument by default, but I’ve succesfully made them work enough to compute complex step derivative of functions involving them:

log1p, expm1 etc. those have quite short code and usually fail due to lines like if(x < 1e-4) { ... }, providing a complex specialization often means just copying the code and replacing comparison of values with comparison of absolute value (e.g. if(abs(x) < 1e-4) { ... }) - the underlying approximations motivating many of the implementations are usually valid in complex plane as well.
lgamma() and other complex functions can be approximated via Taylor expansion around x.imag() == 0. This might not work when the argument to the function is already heavily transformed (e.g. lgamma(exp(15 * x)), but when the argument is “close” to the original (e.g. lgamma(x + 1)), the imaginary part will be the small “complex step” and the approximation holds. For lgamma this gives:

auto lgamma_c_approx = [](const std::complex<double>& x) {
  return std::complex<double>(lgamma(x.real()),
                              x.imag() * boost::math::digamma(x.real()));
  };

There are two possible approaches to computing complex step derivative with such custom implementations:

build specializations of necessary library functions admitting std::complex, making the current implementation of the function work with complex step. This could be a test-only library as we probably don’t want to invest in vouching for 100% correctness of the implementations. On the other hand, as there is work on std::complex<var>, such functions might be necessary anyway. For lgamma (and possibly others) there are complex implementations we could easily use, e.g. the one on Wikipedia
write a specialized test-only version of the function to be tested (or it’s simplest formula) using specialized functions - this is what I currently do for the neg_binomial_2_lpmf test against complex step

Hope that is more specific and actionable.

Bob_Carpenter · January 23, 2020, 7:00pm

Thanks! Yes, that’s clear.

I have a branch in stan-math that’s going to fold complex number support into Stan math for autodiff vars and matrices.

It’d be great if we could start expanding our library to handle complex inputs, even if we just have to define everything templated to start and autodiff through it. The branch above will provide support for autodiff for every function defined for std::complex in the standard library.

Topic		Replies	Views
Testing user defined probability function implementations in Stan generated quantities Modeling	3	682	August 23, 2023
Discussing tests in Stan math Developers math	5	857	January 28, 2020
Distribution tests - vector parameters Developers	4	410	July 1, 2019
Remove tests for row vectors (from the distribution tests within the math library) Developers maintenance	25	1647	November 2, 2017
Testing for 7-parameter function with known partials Developers	13	650	July 3, 2025

Do we need more unit tests for distributions?

Related topics