Let’s say I’m familiar with templated C++ and in most cases I can wrap my head around your code base, but haven’t done any complex template stuff myself.
I’m familiar with the basic architecture and what’s happening, as I’ve done some profiling of compiled Stan models and helped @rok_cesnovar implement a few GPU math functions and unit tests for Stan math.
Yes, I’m familiar with that.
it should be noted that I’m picking a lot of stuff along the way as I look into Stan math code, since my main focus before was on GPU programming and I didn’t dig very deep into other stuff.
I think It would be best if I start with something straightforward. I presume adding tests for higher order derivatives by hand would be such a thing.
It would help to write an issue and then create a PR to close that issue when it’s merged. That way we can keep track of what’s going on. We manage our releases by looking at which issues were closed during that release.
If you run into any problems or think of ways to improve the process, just post on the forums. Thanks!
@syclik I have been doing a lot of digging around the unit tests. Now, before I go openning any issue I need a little bit of help, because I got lost a little in the process. I assume the higher order derivative tests are in the /test/unit/math/mix directory for all kinds of functions. I looked over the tests for the core functions and I think I get the concept. It’s maybe a dumb question but I don’t seem to be able to pinpoint on where should I start adding tests, because there is a lot of stuff going on in there. Could you provide me with a straightforward clue on where to begin.
test/unit/ are where the unit tests live. We have test/prob where the probability distribution tests live and we’ve left this level in here so we can add other types of tests in the future.
within the test/unit/math/ level, you’ll see different folders. The relevant ones are prim for primitive (without autodiff), rev for reverse mode, fwd for forward mode, and mix for mixed mode. Our reverse mode class is stan::math::var. Our forwad mode is templated; things inside fwd only test stan::math::fvar<double>. For mixed mode, we mean stan::math::fvar<stan::math::var> (and really fvar<fvar<var>>).
after that level, everything is arranged parallel to the source code.
tests… all the test files are named *_test.cpp. We use the name to know that it needs to be built with the Google testing framework.
Definitely not a dumb question.
If you’re looking to start by adding simple tests, you can pick a function inside stan/math/prim/scal/fun/*.hpp and test it (either in forward mode or mixed mode). It really helps to write a couple tests out by hand before trying to embark on a well-designed testing framework.
For example, there are 125 different files in scalar functions but only 84 tests in forward mode:
$ ls stan/math/prim/scal/fun/*.hpp | wc -l
$ ls test/unit/math/fwd/scal/fun/*_test.cpp | wc -l
For one concrete example, there’s no F32 (hypergeometric 3 2) test, so you could start there. (I’m looking at the code and this one might actually have trouble, so maybe something easier?) Let’s assume we go with this one. What are we testing in foward mode:
that it compiles. Since we have 0 tests for this, we don’t know if this will actually compile when instantiated with forward mode. And that’s because it’s templated and it won’t fail compilation until instantiation
that the function value matches what we expect. This may seem trivial, but as we build better functions, we’re often specializing functions so even this trivial condition needs to be checked.
that each of the directional derivatives is correct. This function has 6 arguments that are templated. So I’d expect that each directional derivative is checked (you’ll need to run this 6 times per set of inputs).
error conditions are properly caught. Here, there’s no input checking, but the doc clearly states what’s valid. So there’s nothing to do here, but in other functions there might be places where we throw.
this is tricky, but we want the boundary conditions to have similar properties to prim and rev. We can discuss this when you get there.
As a kick start, you’d put this into test/unit/math/fwd/scal/fun/F32_test.cpp:
Do we want to start doing this before we have the lower-level testing framework in place? It will let us make progress, but will also require us to rewrite everything when we get the general framework.
Yes, I think this is worth it. If it helps us fix parts of the math library that aren’t up to speed, then it’s worth it.
The parts of the tests that are reusable:
the test cases chosen for the test
The parts that aren’t reusable:
the boilerplate: instantiating the autodiff stack, looping, etc.
If someone writes a test out a couple of times, the boilerplate should be pretty quick to write and most of the effort will be in thinking about the test cases, which is what needs to happen with a testing framework anyway.
Will it work if it compiles?
I don’t think so. I could be wrong, but I think because of the way we’ve specialized certain functions, this isn’t true. It’d be nice to know under what conditions it’s not (if it’s some misuse of a template metaprogram, etc).
It might be, though. So if we can establish that by testing thoroughly a few times, I think that effort is worth it.
Which test cases are you thinking about? I thought it was a lot less. (I’m thinking about fvar<double> and scalar functions first.)
I think it’d still be good having some of these tests in place so we can move forward. At the very least, if we want higher order autodiff to work, we do need more people familiar with the internals down to being able to instantiate functions. Yes, having a testing framework will make that easier, but it won’t help with understanding what goes on under the hood.
Wow. While that is a lot of testing, it actually test the implementation. It just tests the different error conditions, exhaustively. It doesn’t actually try to take the derivative or check the value of the function. So we know it won’t crash, but we don’t know if it’s a correct implementation.
I don’t think taking away the boilerplate code would have helped with the testing of this function. There should have been a little more guidance at the pull request stage.
That’s 30 different test cases in there. Each one is ~30 lines long. And you’re right, if we had the test framework in place, each test case would have been ~5 lines long instead (so maybe around ~150 lines). But we could have easily used the existing Google Test framework to have built that for this test.
This one has all the arguments being the same type, so this is pretty easy. (That’s how the function was written.) If it actually allowed different types, it’d be a lot harder.
Thanks for the thourogh explanation. I definitely need to write a few test cases to get familiar with the overall approach. I tried to implement the test for the F32, and looks like it has some problems as you mentioned, or maybe I am missing something (also: probably we should move with this discussion somewhere else). I wrote a test for the function value and it fails to compile with
error: no matching function for call to 'F32(stan::math::fvar<double>&, double&, double&, double&, double&, double&)'
fvar<double> r = F32(a1,a2c,a3c,b1c,b2c,zc);
meaning that it can’t instantiate with different argument types or at least that is how I understand it?
Next, I tried to tackle the distance function for the fwd mode, here is the full code. I would really appriciate some comments, to see if I’m on the right track. Well, this test also fails, the problem being the value of the derivative, if you set both argument values to be equal. As I understand, the explicit derivative d/dx1 of the distance function should be (x1-x2)/abs(x1-x2). Now, this should result in NaN if you put the same value for x1 and x2, but it results in the derivative being 0. Why is that?
you’ll see that there’s only one template type, T, and all the arguments (besides for precision and max_steps) are of type T. That means that you use fvar<double> for the first argument and double for the rest… they have to all be the same.
Would it be possible to put this on GitHub so I can make inline comments? It’s hard to do this over pastebin.
Inside dot_product, you see that there’s just element-wise multiplication. 0 * 0 has a derivative of 0.
So… that’s the implementation.
The question you’re asking is if that’s the correct implementation; that’s a good question to be asking. At discontinuous points, we have to make a decision. Do we go with NaN, 0, inf, or something else that makes sense / doesn’t make sense.
I don’t remember what we decided here. It really would have helped to have more documentation in the function or a test that shows the behavior on this boundary, but we don’t have that. Once upon a time, I think we thought that at single discontinuities, under some cases, it makes sense that we return a finite derivative (like 0) that won’t mess up sampling. I think this is a case where that might hold. In other cases, throwing an exception makes sense, but here, I think it’s reasonable to want to take the distance and have vectors be the same resulting in 0. I don’t think we really want the derivatives / gradients to be destroyed at that point due to passing in duplicate vectors. So 0 makes sense here?
@syclik sure thing, I’ll fix my editor settings. Btw, I made a script which goes through all prim, rev, and fwd functions (scalar and matrix), checks for which mode the function was implemented and checks if there is a test written for each case. If the function or test exists it prints out SLOC (source code lines so I have some indication of the complexity of each function/test). This way we can quickly see what is missing and what is bloated. Here is the output.
Cool. That looks super useful. A lot of the checks for these functions are in the mix directory and test all of the instantiations other than double. That includes all the core operators and almost all of the unary vectorized functions.