I am currently trying to implement some helper methods that would make it easier to write strong tests for accuracy of the values and derivatives in Stan math. The most powerful test helper right now is the expect_ad
function which serves three purposes:
A) Tests that it is possible to instantiates all versions of the function (with all combinations of double
, var
, fvar
, fvar<var>
)
B) Tests whether the gradients and Hessians computed by finite diff and by autodiff are similar (often with relatively weak tolerances as finite diffs are finicky). This test are run for each of the instantiations.
C) (implied consequence of A) and B)) Tests that all instantiations - which often don’t share code - compute the same value, derivatives and Hessian if applicable.
I am trying to write a novel test helper with working name expect_expression_identity
that would help to test that an identity holds for a function. @Bob_Carpenter mentioned it would be preferable to have this helper also test the identity across all instantiations. I don’t think this is optimal but was unable to state my concerns clearly in the GitHub issue so I am writing at more length here.
Why care?
There are multiple test helpers that IMHO should be implemented, and those would pose the same question regarding instantiations, so better to resolve it once: I think we definitely want to have expect_precomputed
for streamlining tests against precomputed values and expect_complex_step
that would test the gradients from all modes against complex step differentiation. I also remember @bgoodri mentioning tests of integrals of functions, which could also use some test helpers, and possibly a few more would come up
Current approach
For a hypothetical two parameter function foo
, the current practice would have roughly this testing structure (I hope I don’t mess up details, but the big picture is hopefully OK):
In expect_ad
-
foo(var, double)
is nearfoo(double, double)
-
foo(double, var)
is nearfoo(double, double)
-
foo(var, var)
is nearfoo(double, double)
- gradient of
foo(var, double)
is near finite diffs offoo(double, double)
- gradient of
foo(double, var)
is near finite diffs offoo(double, double)
- gradient of
foo(var, var)
is near finite diffs offoo(double, double)
-
foo(fvar, double)
is nearfoo(double, double)
-
foo(double, fvar)
is nearfoo(double, double)
-
foo(fvar, fvar)
is nearfoo(double, double)
- gradient of
foo(fvar, double)
is near finite diffs offoo(double, double)
- gradient of
foo(double, fvar)
is near finite diffs offoo(double, double)
- gradient of
foo(fvar, fvar)
is near finite diffs offoo(double, double)
-
foo(fvar<var>, double)
is nearfoo(double, double)
-
foo(double, fvar<var>)
is nearfoo(double, double)
-
foo(fvar<var>, fvar<var>)
is nearfoo(double, double)
- gradient and Hessian of
foo(fvar<var>, double)
is near finite diffs offoo(double, double)
- gradient and Hessian of
foo(double, fvar<var>)
is near finite diffs offoo(double, double)
- gradient and Hessian of
foo(fvar<var>, fvar<var>)
is near finite diffs offoo(double, double)
Current approach - extrapolation
For expect_expression_identity
(not implemented yet) following the same approach would entail those tests:
-
f(foo(double, double))
is nearg(foo(double, double))
-
f(foo(double, var))
is nearg(foo(double, var))
(including derivatives) -
f(foo(var, double))
is nearg(foo(var, double))
(including derivatives) -
f(foo(var, var))
is nearg(foo(var, var))
(including derivatives) -
f(foo(double, fvar))
is nearg(foo(double, fvar))
(including derivatives) -
f(foo(fvar, double))
is nearg(foo(fvar, double))
(including derivatives) -
f(foo(fvar, fvar))
is nearg(foo(fvar, fvar))
(including derivatives) -
f(foo(double, fvar<var>))
is nearg(foo(double, fvar<var>))
(including derivatives and Hessian) -
f(foo(fvar<var>, double))
is nearg(foo(fvar<var>, double))
(including derivatives and Hessian) -
f(foo(fvar<var>, fvar<var>))
is nearg(foo(fvar<var>, fvar<var>))
(including derivatives and Hessian)
I however believe it is impractical to require all tests helpers to evaluate all instantiations (the required templating is tedious) and that the implied tests for C) are unnecessarily weak.
Proposed approach 1
We are IMHO better served by writing a separate test that directly tests A) and C) (called expect_instantiations
below) and leave expect_ad
to handle B) as a separate problem. In other words that we should separate concerns of the tests.
The proposed tests would than be
expect_instantiation
-
foo(var, var)
equalsfoo(double, double)
-
foo(var, var)
equalsfoo(double, var)
(including shared derivatives) -
foo(var, var)
equalsfoo(var, double))
(including shared derivatives) -
foo(fvar, fvar)
equalsfoo(double, double)
-
foo(fvar, fvar)
equalsfoo(double, fvar)
(including shared derivatives) -
foo(fvar, fvar)
equalsfoo(fvar, double))
(including shared derivatives) -
foo(fvar<var>, fvar<var>)
equalsfoo(double, double)
-
foo(fvar<var>, fvar<var>)
equalsfoo(double, fvar<var>)
(including shared derivatives and Hessians) -
foo(fvar<var>, fvar<var>)
equalsfoo(fvar<var>, double))
(including shared derivatives and Hessians)
expect_ad
- gradient of
foo(var, var)
matches finite diffs offoo(double, double)
- gradient of
foo(fvar, fvar)
matches finite diffs offoo(double, double)
- gradient and Hessian of
foo(fvar<var>, fvar<var>)
matches finite diffs offoo(double, double)
expect_expression_identity
-
f(foo(double, double))
is nearg(foo(double, double))
-
f(foo(var, var))
is nearg(foo(var, var))
(including derivatives) -
f(foo(fvar, fvar))
is nearg(foo(fvar, fvar))
(including derivatives) -
f(foo(fvar<var>, fvar<var>))
is nearg(foo(fvar<var>, fvar<var>))
(including derivatives and Hessian)
The improvement here is not only that we restrict the scope of messy templating, and make implementing new helpers easier, but also that we can IMHO now test C) very strictly - in expect_instantiations
equivalence up to few ULPs (as implemented in EXPECT_DOUBLE_EQ
) should IMHO be achievable. If this is the case, we can safely assume that if expect_instantiations
passes, the much weaker tolerances in expect_ad
and expect_expression_identity
are also satisfied for all instantiations by transitivity.
Proposed approach 2
If we really want to maintain testing for all instantiations in all cases, some of the simplifications could also be achieved by writing a helper that just evaluates a functor and its derivatives and Hessians for all instantions and stores the results in some manageable structure (say a vector of values, a matrix of derivatives with NAs in places where the derivatives is not computed for the given instantiaion and a similar 3D structure for the Hessian) so that other helpers just call this and then iterate over the results without templates. I however like this one a bit less, although I admit the reasons are more esthetic than technical.
Tagging @Bob_Carpenter and @syclik for comments.
Thanks a lot for reading through this!