`expect_near_rel` behaves weirdly for values around 1e-8

Inspired by @Bob_Carpenter I started using expect_near_rel instead of various manual solutions to test “approximate equivalence” in my tests for Stan math. I have however become slightly suspect of its behavior recently and want to discuss whether it’s implementation shouldn’t change (especially if we want to encourage it’s use across the codebase).

The relevant part of the code is:

void expect_near_rel_finite(const std::string& msg, const T1& x1, const T2& x2,
                            double tol = 1e-8) {
  using stan::math::fabs;
  // if either arg near zero, must use absolute tolerance as rel tol -> 2
  if (fabs(x1) < tol || fabs(x2) < tol) {
    EXPECT_NEAR(x1, x2, tol) 
    return;
  }
  auto avg = 0.5 * (fabs(x1) + fabs(x2));
  auto relative_diff = (x1 - x2) / avg;
  EXPECT_NEAR(0, relative_diff, tol)

This however behaves strangely for values close to the tolerance (1e-8 by default), where a steep change in required precision happens. For example, I now have a failing test with:

actual:    1.2019658157669255e-008, 
expected:  1.2019540397111151e-008, 
relative tolerance = 1e-008; relative diff = 9.7973780478180374e-006

This is OK to fail, but if I had actual = 0.9e-008, the test would suddenly pass, even though the difference just increased.

Some discussion on the topic is at Relative delta in finite difference approximations of derivatives? where @anon75146577 recommends using max(epsilon, epsilon*|x|) as a tolerance, which doesn’t have this problem, but might be unnecessarily lax when we care about small numbers. I’ve previously used something like max(1e-14, 1e-8*|x|).

So I am asking whether there is some good reason expect_near_rel is implemented as it is and whether it might be sensible to implement it differently. expect_near_rel is currently used in 21 tests, so it should not be a major change. It is also possible that expect_near_rel is tailored to the autodiff case and shouldn’t be used elsewhere…

Thanks for any input!

Aren’t the current tests kinda lax for small numbers too?

Possibly yes. My motivation is that I am trying to handle numerical inaccuracies in the current implementation of neg_binomial_2 and I am seriously scared of messing something up, so I want to make sure I do the “right thing” in my tests.

What I mean by that is:

EXPECT_NEAR(x1, x2, tol) 

Doesn’t that just make sure abs(x1 - x2) < tol? If so then a comparison of something like 1e-16 vs. 1.5e-16 has the same problems as abs(x1 - x2) < max(epsilon, epsilon * abs(x)) right?

2 Likes

Nope. Just my best shot. I don’t think I understood Dan’s suggestion or I would’ve implemented it.

It’s the basis of expect_ad, so it’s used in a whole lot more than that indirectly.

Yes.

Disclaimer: I just taught myself some numerics in the past few months, I might easily be very wrong

I’ve been thinking a bit about this, and now I most like to make the tolerance depend on the average as:

auto avg = 0.5 * (fabs(x1) + fabs(x2));
tol_final = max(tol_min, tol * avg);
EXPECT_NEAR(0, x1 - x2, tol_final);

here both tol and tol_min can be set by the user and the default for tol_min is tol_rel ^ 2.

The proposed approach gives the same relative tolerance as the current approach when avg >= tol, but then increases the tolerance more gradually.

Graphically, the relative tolerance implied by the current and the proposed approach when tol = 1e-8 looks like this:

The code I’ve written for my recent pull request passes this more stringent standard, but some other tests fail - I am not sure if this is cause for concern or simply means that the tolerances need to be better adjusted.

Does that make sense as a recommended practice? Or am I missing something obvious?

Code for the plot
library(ggplot2)

x <- 10^(seq(-15,5, length.out = 100))
tol_current <- ifelse(x < 1e-8, 1e-8, x* 1e-8)
tol_proposed <- pmax(x*1e-8,1e-16)
df = rbind(
  data.frame(x = x, type = "current", tol = tol_current),
  data.frame(x = x, type = "proposed", tol = tol_proposed))
df$relative_tol = df$tol / df$x

ggplot(df, aes(x = x, y = relative_tol, color = type)) + geom_line() + scale_x_log10("0.5 * (fabs(x1) + fabs(x2))")+ scale_y_log10()

4 Likes

Join the club!

I don’t know. I have very poor intuitions about this kind of thing. But I don’t have issues with changing things to stricter tolerances. We can just lower the tolerance of things that fail. The problem, I think, is that many of our functions don’t have great relative tolerance behavior around zero.

Maybe @anon75146577 has an opinion.

1 Like

I filed an issue for this https://github.com/stan-dev/math/issues/1656 and I am going to test what happens if the behavior changes.

2 Likes

So the change seems relatively painless, I’ve started a PR https://github.com/stan-dev/math/pull/1657 happy to get any feedback.

1 Like