Higher-Order Autodiff, testing framework & RHMC


#21

Thanks, Here are my anwsers:

  • Let’s say I’m familiar with templated C++ and in most cases I can wrap my head around your code base, but haven’t done any complex template stuff myself.

  • I’m familiar with the basic architecture and what’s happening, as I’ve done some profiling of compiled Stan models and helped @rok_cesnovar implement a few GPU math functions and unit tests for Stan math.

  • Yes, I’m familiar with that.

it should be noted that I’m picking a lot of stuff along the way as I look into Stan math code, since my main focus before was on GPU programming and I didn’t dig very deep into other stuff.

I think It would be best if I start with something straightforward. I presume adding tests for higher order derivatives by hand would be such a thing.


#22

Awesome! Yes, the first thing is to just add test cases for any higher order function that doesn’t have tests. If you don’t mind, please take a look at the Math library’s contributing guidelines: https://github.com/stan-dev/math/blob/develop/.github/CONTRIBUTING.md

It would help to write an issue and then create a PR to close that issue when it’s merged. That way we can keep track of what’s going on. We manage our releases by looking at which issues were closed during that release.

If you run into any problems or think of ways to improve the process, just post on the forums. Thanks!


#23

@syclik I have been doing a lot of digging around the unit tests. Now, before I go openning any issue I need a little bit of help, because I got lost a little in the process. I assume the higher order derivative tests are in the /test/unit/math/mix directory for all kinds of functions. I looked over the tests for the core functions and I think I get the concept. It’s maybe a dumb question but I don’t seem to be able to pinpoint on where should I start adding tests, because there is a lot of stuff going on in there. Could you provide me with a straightforward clue on where to begin.


#24

Pretty much. The way it’s broken down:

  • test/unit/ are where the unit tests live. We have test/prob where the probability distribution tests live and we’ve left this level in here so we can add other types of tests in the future.
  • within the test/unit/math/ level, you’ll see different folders. The relevant ones are prim for primitive (without autodiff), rev for reverse mode, fwd for forward mode, and mix for mixed mode. Our reverse mode class is stan::math::var. Our forwad mode is templated; things inside fwd only test stan::math::fvar<double>. For mixed mode, we mean stan::math::fvar<stan::math::var> (and really fvar<fvar<var>>).
  • after that level, everything is arranged parallel to the source code.
  • tests… all the test files are named *_test.cpp. We use the name to know that it needs to be built with the Google testing framework.

Definitely not a dumb question.

If you’re looking to start by adding simple tests, you can pick a function inside stan/math/prim/scal/fun/*.hpp and test it (either in forward mode or mixed mode). It really helps to write a couple tests out by hand before trying to embark on a well-designed testing framework.

For example, there are 125 different files in scalar functions but only 84 tests in forward mode:

$ ls stan/math/prim/scal/fun/*.hpp | wc -l
     125
$ ls test/unit/math/fwd/scal/fun/*_test.cpp | wc -l
      84

For one concrete example, there’s no F32 (hypergeometric 3 2) test, so you could start there. (I’m looking at the code and this one might actually have trouble, so maybe something easier?) Let’s assume we go with this one. What are we testing in foward mode:

  1. that it compiles. Since we have 0 tests for this, we don’t know if this will actually compile when instantiated with forward mode. And that’s because it’s templated and it won’t fail compilation until instantiation
  2. that the function value matches what we expect. This may seem trivial, but as we build better functions, we’re often specializing functions so even this trivial condition needs to be checked.
  3. that each of the directional derivatives is correct. This function has 6 arguments that are templated. So I’d expect that each directional derivative is checked (you’ll need to run this 6 times per set of inputs).
  4. error conditions are properly caught. Here, there’s no input checking, but the doc clearly states what’s valid. So there’s nothing to do here, but in other functions there might be places where we throw.
  5. this is tricky, but we want the boundary conditions to have similar properties to prim and rev. We can discuss this when you get there.

As a kick start, you’d put this into test/unit/math/fwd/scal/fun/F32_test.cpp:

#include <stan/math/fwd/scal.hpp>
#include <gtest/gtest.h>

TEST(Fwd, F32) {
}

We include the math header first; this is for forward mode and scalar functions. These high-level includes guarantee that the template metaprograms are set up in the correct order.

Then include the Google test header.

Then write test cases.

Let me know if you get stuck anywhere.


#25

Do we want to start doing this before we have the lower-level testing framework in place? It will let us make progress, but will also require us to rewrite everything when we get the general framework.

Will it work if it compiles?


#26

Yes, I think this is worth it. If it helps us fix parts of the math library that aren’t up to speed, then it’s worth it.

The parts of the tests that are reusable:

  • the test cases chosen for the test

The parts that aren’t reusable:

  • the boilerplate: instantiating the autodiff stack, looping, etc.

If someone writes a test out a couple of times, the boilerplate should be pretty quick to write and most of the effort will be in thinking about the test cases, which is what needs to happen with a testing framework anyway.

Will it work if it compiles?

I don’t think so. I could be wrong, but I think because of the way we’ve specialized certain functions, this isn’t true. It’d be nice to know under what conditions it’s not (if it’s some misuse of a template metaprogram, etc).

It might be, though. So if we can establish that by testing thoroughly a few times, I think that effort is worth it.


#27

The test case inputs are a few lines of code and easy to build. The boilerplate runs to thousans of lines in many cases.

Then there’s something wrong with our basic design, I think. That may be the case, but I’d like to test that rather than having tons of redundant tests. At least until we have a framework.


#28

Which test cases are you thinking about? I thought it was a lot less. (I’m thinking about fvar<double> and scalar functions first.)

I think it’d still be good having some of these tests in place so we can move forward. At the very least, if we want higher order autodiff to work, we do need more people familiar with the internals down to being able to instantiate functions. Yes, having a testing framework will make that easier, but it won’t help with understanding what goes on under the hood.


#29

In mix/mat/fun, the file gp_periodic_cov_test.cpp takes the cake at 20,000 lines (that’s not a typo—it’s over a megabyte of text). But even crossprod_test.cpp breaks 1000 lines.

Even down in mix/scal/fun, you get some pretty big tests.

$ wc grad_F32_test.cpp
     961    2154   33769 grad_F32_test.cpp

That’s nearly 1000 lines, and that’s just the mix test. There are another 100 lines in the prim test. But that’s just a scalar function.


#30

Thanks. I’ll take a look to see what that’s about.


#31

Wow. While that is a lot of testing, it actually test the implementation. It just tests the different error conditions, exhaustively. It doesn’t actually try to take the derivative or check the value of the function. So we know it won’t crash, but we don’t know if it’s a correct implementation.

I don’t think taking away the boilerplate code would have helped with the testing of this function. There should have been a little more guidance at the pull request stage.

For: grad_F32_test.cpp
That’s 30 different test cases in there. Each one is ~30 lines long. And you’re right, if we had the test framework in place, each test case would have been ~5 lines long instead (so maybe around ~150 lines). But we could have easily used the existing Google Test framework to have built that for this test.

This one has all the arguments being the same type, so this is pretty easy. (That’s how the function was written.) If it actually allowed different types, it’d be a lot harder.


#32

Thanks for the thourogh explanation. I definitely need to write a few test cases to get familiar with the overall approach. I tried to implement the test for the F32, and looks like it has some problems as you mentioned, or maybe I am missing something (also: probably we should move with this discussion somewhere else). I wrote a test for the function value and it fails to compile with

error: no matching function for call to 'F32(stan::math::fvar<double>&, double&, double&, double&, double&, double&)'
  fvar<double> r = F32(a1,a2c,a3c,b1c,b2c,zc);

the test code being:

TEST(Fwd, F32) {
	using stan::math::fvar;
	using stan::math::F32;
	using stan::math::exp;

	fvar<double> a1(1.0, 1.0);
	fvar<double> a2(1.0, 1.0);
	fvar<double> a3(1.0, 1.0);
	fvar<double> b1(1.0, 1.0);
	fvar<double> b2(1.0, 1.0);
	fvar<double> z(0.6, 1.0);
	
	double a1c = 1.0;
	double a2c = 1.0;
	double a3c = 1.0;
	double b1c = 1.0;
	double b2c = 1.0;
	double zc = 0.6;
	
	fvar<double> r = F32(a1,a2c,a3c,b1c,b2c,zc);
	EXPECT_FLOAT_EQ(F32(1.0,1.0,1.0,1.0,1.0,0.6), r.val_);
}

meaning that it can’t instantiate with different argument types or at least that is how I understand it?

Next, I tried to tackle the distance function for the fwd mode, here is the full code. I would really appriciate some comments, to see if I’m on the right track. Well, this test also fails, the problem being the value of the derivative, if you set both argument values to be equal. As I understand, the explicit derivative d/dx1 of the distance function should be (x1-x2)/abs(x1-x2). Now, this should result in NaN if you put the same value for x1 and x2, but it results in the derivative being 0. Why is that?


#33

I think this is fine, unless we spend the whole time talking about F32, but I don’t think we will.

Yes, you’re missing something critical here. The function implemented is here:

If you take a look at the definition of the function:

template <typename T>
 T F32(const T& a1, const T& a2, const T& a3, const T& b1, const T& b2,
      const T& z, double precision = 1e-6, int max_steps = 1e5)

you’ll see that there’s only one template type, T, and all the arguments (besides for precision and max_steps) are of type T. That means that you use fvar<double> for the first argument and double for the rest… they have to all be the same.

Would it be possible to put this on GitHub so I can make inline comments? It’s hard to do this over pastebin.

You’ll have to dig through the implementation.

For forward mode, this starts here:

Inside dot_product, you see that there’s just element-wise multiplication. 0 * 0 has a derivative of 0.

So… that’s the implementation.

The question you’re asking is if that’s the correct implementation; that’s a good question to be asking. At discontinuous points, we have to make a decision. Do we go with NaN, 0, inf, or something else that makes sense / doesn’t make sense.

I don’t remember what we decided here. It really would have helped to have more documentation in the function or a test that shows the behavior on this boundary, but we don’t have that. Once upon a time, I think we thought that at single discontinuities, under some cases, it makes sense that we return a finite derivative (like 0) that won’t mess up sampling. I think this is a case where that might hold. In other cases, throwing an exception makes sense, but here, I think it’s reasonable to want to take the distance and have vectors be the same resulting in 0. I don’t think we really want the derivatives / gradients to be destroyed at that point due to passing in duplicate vectors. So 0 makes sense here?

It’s an open question, though. Thoughts?


#34

Would you consider a template default policy as an answer?


#35

Yes, but I don’t think it’s worth the maintenance. I could be convinced otherwise.

I think we could be explicit and say what happened a at that boundary.