Separation of concerns in autodiff tests

martinmodrak · February 14, 2020, 10:19am

There’s a lot of useful input on tests in general, thanks for that.

I however have to admit I am slightly frustrated that the discussion almost ignores what I believe are the core questions: How do we make writing test helpers simpler? Do we want test helpers to have clearly separate concerns or do we prefer them to test very broadly and overlap?

So far the only reaction that is IMHO directly relevant to my core concerns is:

Which I interpret as: “We want broad tests and we don’t care a lot for making test helpers simpler to write”. And I am open to this being the best solution, although I remain unconvinced at this point.

I also don’t want this discussion to take more time than it would take to actually implement the test helpers the current (IMHO more tedious and harder to maintain) way. I do care about this as currently it is me implementing those and I have limited time to devote to this. I hope you trust me that I spent some time thinking about this and my proposals are not completely frivolous. Please try to aim for the core of the issue. It is also possible I am just communicating this badly, but I don’t currently see how I can do much better, so I’d like to ask for a charitable re-reading of my proposals.

As I said there are a lot of good points and I agree I missed a bunch of stuff expect_ad is doing, but I think those are minor details (if you disagree, please explain why do you think a particular point is important for the big picture).

Below I react only to stuff I believe touches on the core:

B) and C) overlap only partially, as say the gradients from foo(T, double) and foo(T,T) are never directly compared to each other. Note that I could currently take for example log_sum_exp_vd_vari and multiply its gradient by 1 + 1e-8 (but not do this for the other versions) and expect_ad and the proposed expect_expression_identity would likely still pass for all instantiations as 1e-8 is below most tolerances we use for gradients. The advantage of testing C) by direct comparison between say foo(T, double) and foo(T,T) is that I can have very low tolerance. I agree few bugs are likely to manifest this way, so it is not a big deal. My main goal is in simplifying the testing code and making it more maintainable.

I believe that testing time/memory consumption is currently strongly dominated by compilation, so I wouldn’t worry about this a lot. But please correct me, if I am wrong. As I noted that amount of templates is limiting for some tests we have I guess that reducing templating could actually be a net gain in test time even if we instead construct some large matrices during tests. Both my proposals reduce the number of templates instantiated per test.

If we can put very strict limits on differences between the results (values, gradients, hessians, …) of foo(double, T), foo(T, double) and foo(T,T), then any strong test on foo(T,T) is also a strong test for foo(double, T) and foo(T, double). The proposed expect_instantiations would fail if the differences are anything but negligible. Fruther please note that my Proposal 2 means all instantiations are tested in all cases, but still makes implementing new test helpers easier.

Topic		Replies	Views
Unit test for differentiation Developers	6	772	January 28, 2020
Paper on autodiff for implicit function Publicity	0	635	December 30, 2021
Relative delta in finite difference approximations of derivatives? Algorithms	17	3818	August 27, 2019
Help write an autodiff handbook Publicity	19	1920	June 8, 2020
Unit Tests: Finite-Diff Developers	7	853	December 14, 2016

Separation of concerns in autodiff tests

Related topics