Testing for 7-parameter function with known partials

Dear all,
I’m working with @Franzi to get the 7-parameter DDM-LPDF-function ready for a pull request. We wondered how we should go about the testing and would be happy about some advice:

  • Distribution Tests: The current framework seems to be limited to 5 parameters. Should we try to extend this or just test the function using the current functionality with multiple combinations of 5 parameters while keeping 2 parameters fixed?
  • Second and third derivatives: Currently the function only supports first derivatives, i.e. arguments passed as double or fvar<double> (as our derivatives don’t work with var-Type yet, similar to the Wiki-example). The distribution tests/autodiff-tests pass for example fvar<fvar<double>> to the function, should we adjust the function to make this possible?
  • In addition, we have some Unit tests to check the basic functionality

The autodiff testing framework also requires higher-order derivatives, do we have to use these tests if we can also test the partials against correct values from other sources?

Best,
Valentin

A 7-parameter function is going to be challenging to test.

I’m also worried the full set of signatures might wind up being too large. It’s also going to tax our type inference because we blow out the combinatorics of its types. If each argument can be double or var, then you already have 2^7 = 128 combinations of arguments to test. If they can also be array, vector, and row vector, and each of those can be primitive or autodiff, then we now have 8^7 > 10^6 combinations of arguments. This might be too much for our language type inference. @WardBrian—any idea what will happen with this?

Although we prefer fully defined functions, it’s OK to write functions that only have reverse-mode autodiff.

The current autodiff test framework is set up to do all of the higher-order tests as well as reverse mode. It would be helpful to modify the whole thing to let you specify which derivatives are being tested. That’d be a big, but super helpful change to our testing framework.

The usual way to make the higher-order autodiff work is to write a templated primitive version. But to do that, each argument needs to be templated separately. And all the primitives used need to be differentiable at the appropriate order.

You don’t need to use the testing framework if you have known derivatives, but I don’t know how to check all the combinations of input types otherwise.

Do you have any suggestions, @valmapra?

We store every defined signature in memory, so this would be bad (though we don’t differentiate between double/var, so we probably wouldn’t get up to 8^7, but 4^7 is still 16k)

A not insignificant portion of our testing in the compiler is testing the combinatorics of a few distributions which ~5 arguments and a lot of overloads, so it’s a valid concern.

I should note that it probably wouldn’t be too horrible. Even assuming a probably-overkill allocation of 128 bytes per signature, we’re only talking about an additional 2MB of memory of each of the 7 arguments had 4 valid types. This is in a hash map, so it also wouldn’t really impact compile times of models which don’t use it.

So, the real question is how many overloads there would be for each argument

Thanks a lot for your suggestions. The terminology was a bit unclear in my last post, it’s 7 parameters + data so we end up with 8 arguments.

Currently they are all vectorized to make using the function more convenient. If we leave iterating over the arguments to the end user we could remove vectorization for the 7 parameters if it would be absolutely necessary. @WardBrian With 8 arguments the output from stanc --dump-stan-math-signatures | grep -A 2 wiener_full_lpdf is about 8MB as you estimated, would this be too much?

@Bob_Carpenter: Regarding specifying which derivatives are being tested: I had a look at test/prob/generate_tests.cpp. The create_files function is called 6 times in main, with a parameter index for selecting the kind of test (var, ffv, varmat, fd, fv, ffd). If we would allow for passing which indices should be included, maybe it wouldn’t be that big a change (I could have missed something, though). One possibility would be to introduce an optional line (e.g. // Derivatives: var fv) similar to how the “Arguments” line is currently processed. In create_files we could then check:

  • Is this line present? No → continue as before
  • Yes → Is the string corresponding to the current index present? (e.g 1 corresponds to var, so if index == 1, check if “var” is there) No → abort and return, not creating tests
  • Yes → Continue as before, creating tests

In our case, the huge number of parameters could get in the way, as the framework doesn’t allow for that many parameters and would probably take long if it would.
As our subfunctions only deal with doubles, the way we handle the templated arguments is very straight-forward. For an argument const T_y& y we do (all our arguments behave like y), omitting checks:

using T_y_ref = ref_type_t<T_y>;
T_y_ref y_ref = y;
scalar_seq_view<T_y_ref> y_vec(y_ref);
for(size_t i = 0; i < N; i++) {
    const double y_val = y_vec.val(i);
    ...
}

at which point we should always have a double in y_val that we work with. The derivatives are constructed using operands_and_partials. Therefore I think the risk that something will go wrong with the types is quite low (assuming that only var, double and the correspoding vectorized types will be passed). Do you think we could go without using the testing framework and relying on hand-written Unit tests (maybe scalar+vector with types double+var for each argument while using scalar doubles for each of the other arguments, which would give a doable 4*8=32 cases)?

8MB of text output doesn’t necessarily mean that is how much memory the compiler will be using on it. Can you pipe the grep into wc -l instead to give a rough number of signatures used?

Edit: Based on some rough numbers I just collected, we currently store ~30k signatures in ~7mb

This are exactly 4^8, so 65536 signatures, which then would be about 15MB with your numbers

Yeah I think that will not work. I’m having a hard time pinning down exactly what the memory usage of our current list is (I’m getting numbers between 7 and 17MB depending on which method I use), but increasing it by a factor of 200% is probably a no-go.

Ok, thanks for pointing that out, then we’ll reduce this. For us, there are 2 ways that would make sense (though we are not sure yet which on is better), one is with 1 and the other with 5 vectorized arguments, resulting in 512 or 8192 signatures. Would the latter be already small enough or do we not even have to consider it?
Edit: Sorry, I messed up the calculation, using factors of 2 instead of 1 for scalars. Right results would be 4 resp. 1024.

8192 is still 4x larger than the current most-overloaded funtion, but it’s within the realm where I don’t think that is an immediate dealbreaker. At that point it is more of a question of what the demand/usecase is for this function and whether it would justify the extra testing and memory usage.

Ok, we’ll probably go with allowing vectorized input only for the data, not the parameters, so we’ll end up with only 4 signatures (see my edit above). Thanks for your help in sorting this out :D

@Bob_Carpenter The signature is only specified in stanc3, so being templated the C++ function in stan-math will still be able to take vectors, even if one cannot pass them through Stan language anymore. Is this ok or should we actively reject vector input for these parameters?
Do you have an opinion on the tests, would it be acceptable to just go with Unit tests (see above)?

Yes, we only need unit tests.

But we want unit tests for the C++ functions, not just what gets exposed in Stan. So if you’re not going to test vector inputs, you shouldn’t accept them.

I’d hate to have to have you write this function sub-optimally because of testing. Is there really no way to test all of the arguments, even one at a time? You can exploit things you know about the implementation to formulate tests (it just makes them brittle to refactoring).

1 Like