There’s a lot of useful input on tests in general, thanks for that.
I however have to admit I am slightly frustrated that the discussion almost ignores what I believe are the core questions: How do we make writing test helpers simpler? Do we want test helpers to have clearly separate concerns or do we prefer them to test very broadly and overlap?
So far the only reaction that is IMHO directly relevant to my core concerns is:
Which I interpret as: “We want broad tests and we don’t care a lot for making test helpers simpler to write”. And I am open to this being the best solution, although I remain unconvinced at this point.
I also don’t want this discussion to take more time than it would take to actually implement the test helpers the current (IMHO more tedious and harder to maintain) way. I do care about this as currently it is me implementing those and I have limited time to devote to this. I hope you trust me that I spent some time thinking about this and my proposals are not completely frivolous. Please try to aim for the core of the issue. It is also possible I am just communicating this badly, but I don’t currently see how I can do much better, so I’d like to ask for a charitable re-reading of my proposals.
As I said there are a lot of good points and I agree I missed a bunch of stuff expect_ad
is doing, but I think those are minor details (if you disagree, please explain why do you think a particular point is important for the big picture).
Below I react only to stuff I believe touches on the core:
B) and C) overlap only partially, as say the gradients from foo(T, double)
and foo(T,T)
are never directly compared to each other. Note that I could currently take for example log_sum_exp_vd_vari
and multiply its gradient by 1 + 1e-8
(but not do this for the other versions) and expect_ad
and the proposed expect_expression_identity
would likely still pass for all instantiations as 1e-8
is below most tolerances we use for gradients. The advantage of testing C) by direct comparison between say foo(T, double)
and foo(T,T)
is that I can have very low tolerance. I agree few bugs are likely to manifest this way, so it is not a big deal. My main goal is in simplifying the testing code and making it more maintainable.
I believe that testing time/memory consumption is currently strongly dominated by compilation, so I wouldn’t worry about this a lot. But please correct me, if I am wrong. As I noted that amount of templates is limiting for some tests we have I guess that reducing templating could actually be a net gain in test time even if we instead construct some large matrices during tests. Both my proposals reduce the number of templates instantiated per test.
If we can put very strict limits on differences between the results (values, gradients, hessians, …) of foo(double, T)
, foo(T, double)
and foo(T,T)
, then any strong test on foo(T,T)
is also a strong test for foo(double, T)
and foo(T, double)
. The proposed expect_instantiations
would fail if the differences are anything but negligible. Fruther please note that my Proposal 2 means all instantiations are tested in all cases, but still makes implementing new test helpers easier.