Need help with binary vectorization test

I’ve found some time lately to work again on binary vectorization. In particular, I’m trying to understand an error that I get while testing the mix scalar, int vector case, but I need some help. The toy function that I’m using is a constrained pow function so I get the error when I’m essentially testing the case of pow(0, -4).

The code flow, which is similar to the unary vectorization, is the following:

  1. In test/unit/math/mix/mat/vectorize/expect_mix_binary_std_vector_value.hpp, the mix scalar, int vector test case is handled in the function expect_mix_binary_scalar_std_vector_eq.
  2. There, the mix scalar and int vectors are built. F::apply and F::apply_base are used to generate the “test” variable and “expected” variable respectively. Call our expected variables exp_var and our test variables test_var.
  3. expect_binary_val_derive_eq from test/unit/math/mix/mat/vectorize/expect_binary_val_deriv_eq.hpp is called to unspool the mix variables and get the rev variables, which are then tested in test/unit/math/rev/mat/vectorize/expect_binary_val_deriv_eq.hpp.

When the test fails in the case of pow(0, -4), exp_var.d_'s reverse mode gradient is -inf whereas test_var.d_'s reverse mode gradient is nan. However, if F::apply_base is called again or exp_var.d_'s gradient is taken inside of functions in test/unit/math/mix/mat/vectorize/expect_binary_val_deriv_eq.hpp or test/unit/math/mix/mat/vectorize/expect_mix_binary_std_vector_value.hpp, exp_var.d_'s reverse mode gradient is nan and so the tests should pass.

To understand this error, I’ve also checked that exp_var.d_'s gradient remains invariant in test/unit/math/mix/mat/vectorize/expect_binary_val_deriv_eq.hpp even after setting all adjoints to zero or taking test_var.d_'s reverse gradient. Further, F::apply_base's return type is correct and all the rev and fwd mode binary vectorization tests pass. I also checked the existing scalar mix pow test to see if this error might come up there, but the test only tests cases in which the arguments are all positive.

So, I’m wondering what else I can check to debug this error. All the code can be found in the branch feature/issue-202-vectorize-all-binary. The branch is up to date with develop.

(Also, this topic is created under the category of General because I couldn’t put it under the category of Developers.)

It may be because the fvar constructors have some whacky constructors that automatically set the derivative to not-a-number if the value is not-a-number.

I was a little confused. If you have pow(x, y) and x = 0 and y=4 in reverse mode, what’s the gradient w.r.t. y? It should be zero, not infinite. One problem may be that pow(x, y) isn’t defined for x < 0, so derivatives only make sense from one side.

Also, because pow is a built-in, we don’t automatically throw exceptions when given a not-a-number input—we follow the way the default C++ function works.

For the test case that’s causing the bug, y = -4.

Based on your comment, I went back and checked the testing framework. I do test a similar case, i.e. pow(0, 5). That returns 0 as expected.

Also, that makes sense. We ran into something similar when writing the unary vectorization test framework. For instance, sqrt(-1) returns nan. The unary vectorization tests this as a “valid” entry because it doesn’t throw an exception.

It always helps to provide a minimal test with instructions on how to run it. How to run it is just the name of the test file that failed. So it would have been helpful to let me know that this is where the failure comes from:

test/unit/math/mix/mat/vectorize/binary_foo_test.cpp

Would you mind going and finding a minimal test case that fails and checking that in and letting me know which test file to run to tickle the problem? Ideally without spewing unlabeled output. That’s all the explanation I need as to what you think should happen versus what does happen.

One thing you need to do is put the inputs together. Rather than having vector<double> valid_inputs1() and vector<double> valid_inputs2(), you want to put them together into something like:

vector<pair<double, double> > valid_inputs();

that returns pairs of doubles. Of all the tests there, could you trim them down or create a new test file to restirct to just the one failing that you think should pass?

I modified the test so it’s just one call, and it still just fails rather than catching an exception:

test/unit/math/mix/mat/vectorize/binary_foo_test --gtest_output="xml:test/unit/math/mix/mat/vectorize/binary_foo_test.xml"
Running main() from gtest_main.cc
[==========] Running 3 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 3 tests from mix_scalar_binary_test/0, where TypeParam = binary_foo_test
[ RUN      ] mix_scalar_binary_test/0.expect_scalar_types
[       OK ] mix_scalar_binary_test/0.expect_scalar_types (0 ms)
[ RUN      ] mix_scalar_binary_test/0.expect_values
test/unit/math/mix/mat/vectorize/binary_foo_test --gtest_output="xml:test/unit/math/mix/mat/vectorize/binary_foo_test.xml" failed
exit now (06/28/17 15:15:57 EDT)
  /**
   * Return sequence of valid double-valued inputs.
   */
  static std::vector<double> valid_inputs1() {
    return test::math::vector_builder<double>()
      // .add(0.7).add(2.3).add(3.5).add(0).add(0).add(0)
      // .add(-0.3).add(-5.3).add(-3.7)
      .build();
  }

  /**
   * Return sequence of valid double-valued inputs.
   */
  static std::vector<double> valid_inputs2() {
    return test::math::vector_builder<double>()
      // .add(1.3).add(-2.6).add(0).add(-1.2).add(2.3).add(0)
      // .add(-3.7).add(0).add(2.3)
      .build();
  }

  /**
   * Return sequence of invalid double-valued inputs.
   */
  static std::vector<double> invalid_inputs1() {
    return test::math::vector_builder<double>()
      // .add(2.7).add(12.3).add(0).add(-5.3).add(10.4).add(15.2)
      .build();
  }

  /**
   * Return sequence of invalid double-valued inputs.
   */
  static std::vector<double> invalid_inputs2() {
    return test::math::vector_builder<double>()
      // .add(10.7).add(2.3).add(15.2).add(26.3).add(0).add(-2.6)
      .build();
  }

  /**
   * Return sequence of valid integer inputs.
   */
  static std::vector<int> int_valid_inputs1() {
    return test::math::vector_builder<int>()
      .add(1)
      // .add(6).add(3).add(7).add(0).add(0).add(0)
      // .add(-2).add(-5).add(-4)
      .build();
  }

  /**
   * Return sequence of valid integer inputs.
   */
  static std::vector<int> int_valid_inputs2() {
    return test::math::vector_builder<int>()
      .add(2)
      // .add(3).add(-2).add(0).add(5).add(0).add(-4)
      // .add(0).add(6).add(-3)
      .build();
  }

  /**
   * Return sequence of invalid integer inputs.
   */
  static std::vector<int> int_invalid_inputs1() {
    return test::math::vector_builder<int>()
      // .add(3).add(12).add(0).add(-5).add(11).add(15)
      .build();
  }

  /**
   * Return sequence of invalid integer inputs.
   */
  static std::vector<int> int_invalid_inputs2() {
    return test::math::vector_builder<int>()
      // .add(11).add(2).add(15).add(26).add(0).add(-2)
      .build();
  }
};

So I think there’s some work to do on the general testing framework first.

I updated the branch to create a minimal test case that still fails. If you run ./runTests.py test/unit/math/mix/mat/vectorize/binary_foo_test.cpp, you should get the error. Multiple files are used so I also wanted to include a list of files in order of when they’re used:

1. test/unit/math/mix/mat/vectorize/binary_foo_test.cpp
2. test/unit/math/mix/mat/vectorize/mix_scalar_binary_test.hpp 
3. test/unit/math/mix/mat/vectorize/expect_mix_binary_values.hpp
4. test/unit/math/mix/mat/vectorize/expect_mix_binary_std_vector_value.hpp 
(The test case is essentially in the function expect_mix_binary_scalar_std_vector_eq. 
Note that expect_binary_val_deriv_eq is called twice.)
5. test/unit/math/mix/mat/vectorize/expect_binary_val_deriv_eq
6. test/unit/math/rev/mat/vectorize/expect_binary_val_deriv_eq
7. test/unit/math/prim/mat/vectorize/expect_val_eq.hpp

Note that it only errors out once even though expect_binary_val_deriv_eq is called twice in test/unit/math/mix/mat/vectorize/expect_mix_binary_std_vector_value.hpp. I commented out the debug code, but I can delete it for readability if you like.

As for your second point, I can be convinced otherwise, but I’m building two vectors because it’s more flexible and it allows me to build off of the logic of the unary vectorization framework. After all, the testing framework includes tests for the scalar, std::vector, scalar, matrix/vector, std::vector, std::vector, and matrix/vector, matrix/vector case. For the first two cases, the pair formulation works well because it’s easy to broadcast a scalar into a vector or matrix. However, for the latter case, I want to test using vectors or matrices of different values. Using the pair formulation will involve looping through all possible combinations. However, if this isn’t convincing, I can work on changing the framework to use pair's instead of std::vector's.

I don’t think you want to loop through all possible combinations because they won’t always make sense. That is, you can have consistent arguments (x1, y1) and (x2, y2) where (x1, y2) doesn’t make sense.

An example is atan2(x, y), where you can have x or y equal to zero but not both:

http://www.cplusplus.com/reference/cmath/atan2/

I agree and wasn’t clear with my last message. It’s implicit, but right now, if I have

std::vector1<double> a1 = {2, 3};
std::vector2<double> a2 = {1, -1}; 

then for the following two cases (there are other cases it considers):

  1. In the scalar, std::vector case, the framework will test pow(2, std::vector<double>(5, 1)) and pow(3, std::vector<double>(5, -1)).
  2. In the std::vector, std::vector case, the framework will test pow(a1, a2), i.e. {pow(2, 1), pow(3, -1)}.

My comment for the std::vector, std::vector case was about whether it should instead test pow(std::vector<double>(5, 2), std::vector<double>(5, 1)) and pow(std::vector<double>(5, 3), std::vector<double>(5, -1)) instead of pow(a1, a2). pow(std::vector<double>(5, 2), std::vector<double>(5, 1)) would be easier to test with the pair formulation.

I understood that you were trying to use parallel sequences, one in one vector and one in another. When possible, it’s better to keep things together that go together, so I would only do it with parallel structures as a last resort.

Because of the boundary condition issue, you might need to test the vectorization separate with binary cases. You can have a bunch of pairs to test, but you could also have pair<set<double>, set<double>> where the idea would be that each member of the sets would be compatible with each member of the other set.

I pushed a script called tmp_test.cpp in the latest commit to the branch feature/issue-202-vectorize-all-binary. It’s basically a sandbox to compare the 2nd derivatives of applying binary_foo (the vectorized function) on fvar<var>, int and on fvar<var>, std::vector<int> for the problematic case mentioned above of x = 0, y = -4 and the 2nd derivative involving fvar<var>.d_.grad().

Using the script, I noticed the following. The 2nd derivative of applying binary_foo on fvar<var>, int is -Inf. I think this is the correct value because that’s what I got when playing around with the mix pow test. Then, for clarity, I’m going to include a summary of the vectorization code:

std::vector<int> y_vector(5, -4);
std::vector<fvar<var> > result_v(y_vector.size());
for (size_t i = 0; i < std::vector<int>.size(); ++i) {
  result_v = binary_foo(x, y_vector[i]);
}
  • If I take the derivative within the for loop with respect to x.d_, the derivative is

    • -Inf for i = 0
    • nan otherwise
  • If I refresh the memory using recover_memory_nested() within the loop or rebuild the fvar, then the derivative is -Inf for all i. However, if I do this, it doesn’t seem like I can take the derivative with respect to my original fvar x because the 2nd derivative of the first element of result_v is 0.

  • If I use recover_memory() within the for loop (as seen in the script), the 2nd derivative of the first element of result_v is Inf .

  • If I leave the vectorization code as is, I get nan for my derivative.

All of this makes me think the issue is that I’m not handling memory correctly in my attempt to broadcast. Is there a good example in the math library for how to do this? I’ve been looking at probability functions and the add function in prim/mat/fun, but I’m wondering if there’s another function that I can look at. The script can be found in test/unit/math/mix/mat/vectorize and is commented.

The code you have for the vectorization is right other than that I think you mean y_vector.size() in the loop bound. If you try this just by itself, you should see you get the right derivatives—it’s exactly how a loop in Stan would get compiled.

I don’t even know what you mean by taking the derivative in the for loop here. You get a running update with forward mode, but for reverse, you have to wait until you’re done building the expression. You shouldn’t recover memory in that loop at all. None of the vectorization code should be recovering memory, only the calling code.

You should only be getting nan for derivatives if that’s what the unvectorized form produces.

You’re looking for other scalar binary functions? atan2 should be a simple one that can produce nan outputs from non-nan inputs. I don’t know how well it’s tested up to higher order.

Also, for the higher order tests, you have to be careful how you initialize. I think you already did this for the unary vectorization, but if you want a refresher, I’d recommend the functionals for things like Hessians.

The unvectorized form produces -Inf for the 2nd derivative whereas the vectorized form produces nan for the 2nd derivative.

I don’t think this happened in the unary case so I was curious what was different. Within the testing frame work, the binary vectorization uses the following code to build std::vector<fvar<T> >

  vector<FV> template_v1 = build_binary_vector1<F>(vector<FV>(),
                                                                                      seed_index);
  vector<FV> template_v2;
  if (seed_index != -1)
    template_v2 = build_binary_vector1<F>(vector<FV>(), seed_index);

In other words, it builds a separate vector for fvar’s val and d_ if we’re interested in the fvar’s derivative. This is how it is done in the unary case when we are building mix std::vector.

As such, one thing that I saw was different was broadcasting because broadcasting isn’t done in unary vectorization. So, I wrote tmp_test.cpp to only look at the problematic case and play around with the back-end code. One of my experiments was taking the derivative in the for loop, which was the following (this is from the first commit for tmp_test.cpp):

  ...
  fvar<var> x2(0, 0);
  vector<int> a2(5, -4);
  ...
  for (size_t i = 0; i < result2.size(); ++i) {
    stan::math::set_zero_all_adjoints();

    /*
      If the line below is commented out, then 2nd derivative of all 
      but the first element of result2 is nan whereas the 2nd derivative of the
      first element of result2 is -Inf.
    */
    x2 = fvar<var>(0, 0);
    result2[i] = binary_foo(x2, a2[0]); 
    AVEC exp_t = createAVEC(x2.d_);
    VEC exp_tg;
    result2[i].d_.grad(exp_t, exp_tg);
    std::cout << "Vector case " << i << " 2nd derivative: " 
              << exp_tg[0] << std::endl;
  }

And so, sorry for not being clear, but I was curious about other functions that has broadcasting one parameter to the size of another parameter.

Nope, no such broadcasting yet.

One issue you may be running into here is that there’s a hack in fvar (which I believe should be removed, but it’s a big job because of existing tests) which makes sure that the derivative is NaN if the value is NaN. Is this a case where there’s also a NaN value?

You might find it useful to look at the functional rev/mat/functor/jacobian.hpp which shows you how to manage the memory. What you want to do is build up the entire expression graph, then start calculating reverse-mode derivatives, being careful to do that set_zero each time.

I’ll make sure to look at rev/mat/functor/jacobian.hpp.

Following the code in my last post, result2[0].val() returns Inf, but result2[0].d_ returns NaN. I know that the issue only comes up with respect to result2[0].d_.grad(), but is this related to the fvar hack you’re talking about?

I think it only makes sure the tangent is NaN if the value is NaN.

I see. Then, I haven’t finished writing the tests, but if I can get the tests to pass for everything but the broadcasting cases, do you think that will be enough of a first step for binary vectorization? The tests all pass for rev and fwd so I feel that I’m close, but trying to understand this error has been holding back the whole thing for at least a year and it might be nice to have some vectorization functionality for binary functions.

Sorry you’re stuck on this.

When you say enough, you mean not testing the foo(real, vector) and foo(vector, real) cases? I don’t think that’ll be enough. You could write something to test them separately from the other framework. Or I could try to jump in and help, but I’m totally swamped at the moment.

I thought that it might be enough for a first step because unary vectorization didn’t involve broadcasting. Without broadcasting, binary vectorization would just be an extension of unary vectorization, only for functions with two arguments. However, I’m willing to keep digging to make broadcasting a part of the first step for binary vectorization.

Oh, do you mean can you roll out the functions that just take pairs of vectors as arguments? That would be OK.