# Need help with binary vectorization test

#1

I’ve found some time lately to work again on binary vectorization. In particular, I’m trying to understand an error that I get while testing the mix scalar, int vector case, but I need some help. The toy function that I’m using is a constrained `pow` function so I get the error when I’m essentially testing the case of `pow(0, -4)`.

The code flow, which is similar to the unary vectorization, is the following:

1. In `test/unit/math/mix/mat/vectorize/expect_mix_binary_std_vector_value.hpp`, the mix scalar, int vector test case is handled in the function `expect_mix_binary_scalar_std_vector_eq`.
2. There, the mix scalar and int vectors are built. `F::apply` and `F::apply_base` are used to generate the “test” variable and “expected” variable respectively. Call our expected variables exp_var and our test variables test_var.
3. `expect_binary_val_derive_eq` from `test/unit/math/mix/mat/vectorize/expect_binary_val_deriv_eq.hpp` is called to unspool the mix variables and get the rev variables, which are then tested in `test/unit/math/rev/mat/vectorize/expect_binary_val_deriv_eq.hpp`.

When the test fails in the case of `pow(0, -4)`, exp_var.d_'s reverse mode gradient is -inf whereas test_var.d_'s reverse mode gradient is nan. However, if `F::apply_base` is called again or exp_var.d_'s gradient is taken inside of functions in `test/unit/math/mix/mat/vectorize/expect_binary_val_deriv_eq.hpp` or `test/unit/math/mix/mat/vectorize/expect_mix_binary_std_vector_value.hpp`, exp_var.d_'s reverse mode gradient is nan and so the tests should pass.

To understand this error, I’ve also checked that exp_var.d_'s gradient remains invariant in `test/unit/math/mix/mat/vectorize/expect_binary_val_deriv_eq.hpp` even after setting all adjoints to zero or taking test_var.d_'s reverse gradient. Further, `F::apply_base`'s return type is correct and all the rev and fwd mode binary vectorization tests pass. I also checked the existing scalar mix `pow` test to see if this error might come up there, but the test only tests cases in which the arguments are all positive.

So, I’m wondering what else I can check to debug this error. All the code can be found in the branch `feature/issue-202-vectorize-all-binary`. The branch is up to date with develop.

(Also, this topic is created under the category of General because I couldn’t put it under the category of Developers.)

#2

It may be because the `fvar` constructors have some whacky constructors that automatically set the derivative to not-a-number if the value is not-a-number.

I was a little confused. If you have `pow(x, y)` and `x = 0` and `y=4` in reverse mode, what’s the gradient w.r.t. `y`? It should be zero, not infinite. One problem may be that `pow(x, y)` isn’t defined for `x < 0`, so derivatives only make sense from one side.

Also, because `pow` is a built-in, we don’t automatically throw exceptions when given a not-a-number input—we follow the way the default C++ function works.

#3

For the test case that’s causing the bug, `y = -4`.

Based on your comment, I went back and checked the testing framework. I do test a similar case, i.e. `pow(0, 5)`. That returns 0 as expected.

Also, that makes sense. We ran into something similar when writing the unary vectorization test framework. For instance, `sqrt(-1)` returns nan. The unary vectorization tests this as a “valid” entry because it doesn’t throw an exception.

#4

It always helps to provide a minimal test with instructions on how to run it. How to run it is just the name of the test file that failed. So it would have been helpful to let me know that this is where the failure comes from:

``````test/unit/math/mix/mat/vectorize/binary_foo_test.cpp
``````

Would you mind going and finding a minimal test case that fails and checking that in and letting me know which test file to run to tickle the problem? Ideally without spewing unlabeled output. That’s all the explanation I need as to what you think should happen versus what does happen.

#5

One thing you need to do is put the inputs together. Rather than having `vector<double> valid_inputs1()` and `vector<double> valid_inputs2()`, you want to put them together into something like:

``````vector<pair<double, double> > valid_inputs();
``````

that returns pairs of doubles. Of all the tests there, could you trim them down or create a new test file to restirct to just the one failing that you think should pass?

I modified the test so it’s just one call, and it still just fails rather than catching an exception:

``````test/unit/math/mix/mat/vectorize/binary_foo_test --gtest_output="xml:test/unit/math/mix/mat/vectorize/binary_foo_test.xml"
Running main() from gtest_main.cc
[==========] Running 3 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 3 tests from mix_scalar_binary_test/0, where TypeParam = binary_foo_test
[ RUN      ] mix_scalar_binary_test/0.expect_scalar_types
[       OK ] mix_scalar_binary_test/0.expect_scalar_types (0 ms)
[ RUN      ] mix_scalar_binary_test/0.expect_values
test/unit/math/mix/mat/vectorize/binary_foo_test --gtest_output="xml:test/unit/math/mix/mat/vectorize/binary_foo_test.xml" failed
exit now (06/28/17 15:15:57 EDT)
``````
``````  /**
* Return sequence of valid double-valued inputs.
*/
static std::vector<double> valid_inputs1() {
return test::math::vector_builder<double>()
.build();
}

/**
* Return sequence of valid double-valued inputs.
*/
static std::vector<double> valid_inputs2() {
return test::math::vector_builder<double>()
.build();
}

/**
* Return sequence of invalid double-valued inputs.
*/
static std::vector<double> invalid_inputs1() {
return test::math::vector_builder<double>()
.build();
}

/**
* Return sequence of invalid double-valued inputs.
*/
static std::vector<double> invalid_inputs2() {
return test::math::vector_builder<double>()
.build();
}

/**
* Return sequence of valid integer inputs.
*/
static std::vector<int> int_valid_inputs1() {
return test::math::vector_builder<int>()
.build();
}

/**
* Return sequence of valid integer inputs.
*/
static std::vector<int> int_valid_inputs2() {
return test::math::vector_builder<int>()
.build();
}

/**
* Return sequence of invalid integer inputs.
*/
static std::vector<int> int_invalid_inputs1() {
return test::math::vector_builder<int>()
.build();
}

/**
* Return sequence of invalid integer inputs.
*/
static std::vector<int> int_invalid_inputs2() {
return test::math::vector_builder<int>()
.build();
}
};
``````

So I think there’s some work to do on the general testing framework first.

#6

I updated the branch to create a minimal test case that still fails. If you run `./runTests.py test/unit/math/mix/mat/vectorize/binary_foo_test.cpp`, you should get the error. Multiple files are used so I also wanted to include a list of files in order of when they’re used:

``````1. test/unit/math/mix/mat/vectorize/binary_foo_test.cpp
2. test/unit/math/mix/mat/vectorize/mix_scalar_binary_test.hpp
3. test/unit/math/mix/mat/vectorize/expect_mix_binary_values.hpp
4. test/unit/math/mix/mat/vectorize/expect_mix_binary_std_vector_value.hpp
(The test case is essentially in the function expect_mix_binary_scalar_std_vector_eq.
Note that expect_binary_val_deriv_eq is called twice.)
5. test/unit/math/mix/mat/vectorize/expect_binary_val_deriv_eq
6. test/unit/math/rev/mat/vectorize/expect_binary_val_deriv_eq
7. test/unit/math/prim/mat/vectorize/expect_val_eq.hpp
``````

Note that it only errors out once even though `expect_binary_val_deriv_eq` is called twice in test/unit/math/mix/mat/vectorize/expect_mix_binary_std_vector_value.hpp. I commented out the debug code, but I can delete it for readability if you like.

#7

As for your second point, I can be convinced otherwise, but I’m building two vectors because it’s more flexible and it allows me to build off of the logic of the unary vectorization framework. After all, the testing framework includes tests for the `scalar, std::vector`, `scalar, matrix/vector`, `std::vector, std::vector`, and `matrix/vector, matrix/vector` case. For the first two cases, the pair formulation works well because it’s easy to broadcast a scalar into a vector or matrix. However, for the latter case, I want to test using vectors or matrices of different values. Using the pair formulation will involve looping through all possible combinations. However, if this isn’t convincing, I can work on changing the framework to use `pair`'s instead of `std::vector`'s.

#8

I don’t think you want to loop through all possible combinations because they won’t always make sense. That is, you can have consistent arguments `(x1, y1)` and `(x2, y2)` where `(x1, y2)` doesn’t make sense.

An example is `atan2(x, y)`, where you can have `x` or `y` equal to zero but not both:

http://www.cplusplus.com/reference/cmath/atan2/

#9

I agree and wasn’t clear with my last message. It’s implicit, but right now, if I have

``````std::vector1<double> a1 = {2, 3};
std::vector2<double> a2 = {1, -1};
``````

then for the following two cases (there are other cases it considers):

1. In the `scalar, std::vector` case, the framework will test `pow(2, std::vector<double>(5, 1))` and `pow(3, std::vector<double>(5, -1))`.
2. In the `std::vector, std::vector` case, the framework will test `pow(a1, a2)`, i.e. `{pow(2, 1), pow(3, -1)}`.

My comment for the `std::vector, std::vector` case was about whether it should instead test `pow(std::vector<double>(5, 2), std::vector<double>(5, 1))` and `pow(std::vector<double>(5, 3), std::vector<double>(5, -1))` instead of `pow(a1, a2)`. `pow(std::vector<double>(5, 2), std::vector<double>(5, 1))` would be easier to test with the pair formulation.

#10

I understood that you were trying to use parallel sequences, one in one vector and one in another. When possible, it’s better to keep things together that go together, so I would only do it with parallel structures as a last resort.

Because of the boundary condition issue, you might need to test the vectorization separate with binary cases. You can have a bunch of pairs to test, but you could also have `pair<set<double>, set<double>>` where the idea would be that each member of the sets would be compatible with each member of the other set.

#11

I pushed a script called `tmp_test.cpp` in the latest commit to the branch `feature/issue-202-vectorize-all-binary`. It’s basically a sandbox to compare the 2nd derivatives of applying `binary_foo` (the vectorized function) on `fvar<var>, int` and on `fvar<var>, std::vector<int>` for the problematic case mentioned above of `x = 0`, `y = -4` and the 2nd derivative involving `fvar<var>.d_.grad()`.

Using the script, I noticed the following. The 2nd derivative of applying `binary_foo` on `fvar<var>, int` is `-Inf`. I think this is the correct value because that’s what I got when playing around with the mix pow test. Then, for clarity, I’m going to include a summary of the vectorization code:

``````std::vector<int> y_vector(5, -4);
std::vector<fvar<var> > result_v(y_vector.size());
for (size_t i = 0; i < std::vector<int>.size(); ++i) {
result_v = binary_foo(x, y_vector[i]);
}
``````
• If I take the derivative within the `for` loop with respect to `x.d_`, the derivative is

• `-Inf` for i = 0
• `nan` otherwise
• If I refresh the memory using `recover_memory_nested()` within the loop or rebuild the fvar, then the derivative is `-Inf` for all i. However, if I do this, it doesn’t seem like I can take the derivative with respect to my original fvar x because the 2nd derivative of the first element of `result_v` is `0`.

• If I use `recover_memory()` within the for loop (as seen in the script), the 2nd derivative of the first element of `result_v` is `Inf` .

• If I leave the vectorization code as is, I get `nan` for my derivative.

All of this makes me think the issue is that I’m not handling memory correctly in my attempt to broadcast. Is there a good example in the math library for how to do this? I’ve been looking at probability functions and the `add` function in `prim/mat/fun`, but I’m wondering if there’s another function that I can look at. The script can be found in `test/unit/math/mix/mat/vectorize` and is commented.

#12

The code you have for the vectorization is right other than that I think you mean `y_vector.size()` in the loop bound. If you try this just by itself, you should see you get the right derivatives—it’s exactly how a loop in Stan would get compiled.

I don’t even know what you mean by taking the derivative in the for loop here. You get a running update with forward mode, but for reverse, you have to wait until you’re done building the expression. You shouldn’t recover memory in that loop at all. None of the vectorization code should be recovering memory, only the calling code.

You should only be getting `nan` for derivatives if that’s what the unvectorized form produces.

You’re looking for other scalar binary functions? `atan2` should be a simple one that can produce `nan` outputs from non-`nan` inputs. I don’t know how well it’s tested up to higher order.

Also, for the higher order tests, you have to be careful how you initialize. I think you already did this for the unary vectorization, but if you want a refresher, I’d recommend the functionals for things like Hessians.

#13

The unvectorized form produces `-Inf` for the 2nd derivative whereas the vectorized form produces `nan` for the 2nd derivative.

I don’t think this happened in the unary case so I was curious what was different. Within the testing frame work, the binary vectorization uses the following code to build std::vector<`fvar<T> >`

``````  vector<FV> template_v1 = build_binary_vector1<F>(vector<FV>(),
seed_index);
vector<FV> template_v2;
if (seed_index != -1)
template_v2 = build_binary_vector1<F>(vector<FV>(), seed_index);
``````

In other words, it builds a separate vector for fvar’s val and d_ if we’re interested in the fvar’s derivative. This is how it is done in the unary case when we are building mix std::vector.

As such, one thing that I saw was different was broadcasting because broadcasting isn’t done in `unary vectorization`. So, I wrote `tmp_test.cpp` to only look at the problematic case and play around with the back-end code. One of my experiments was taking the derivative in the for loop, which was the following (this is from the first commit for `tmp_test.cpp`):

``````  ...
fvar<var> x2(0, 0);
vector<int> a2(5, -4);
...
for (size_t i = 0; i < result2.size(); ++i) {

/*
If the line below is commented out, then 2nd derivative of all
but the first element of result2 is nan whereas the 2nd derivative of the
first element of result2 is -Inf.
*/
x2 = fvar<var>(0, 0);
result2[i] = binary_foo(x2, a2[0]);
AVEC exp_t = createAVEC(x2.d_);
VEC exp_tg;
std::cout << "Vector case " << i << " 2nd derivative: "
<< exp_tg[0] << std::endl;
}
``````

And so, sorry for not being clear, but I was curious about other functions that has broadcasting one parameter to the size of another parameter.

#14

One issue you may be running into here is that there’s a hack in `fvar` (which I believe should be removed, but it’s a big job because of existing tests) which makes sure that the derivative is NaN if the value is NaN. Is this a case where there’s also a NaN value?

You might find it useful to look at the functional `rev/mat/functor/jacobian.hpp` which shows you how to manage the memory. What you want to do is build up the entire expression graph, then start calculating reverse-mode derivatives, being careful to do that `set_zero` each time.

#15

I’ll make sure to look at `rev/mat/functor/jacobian.hpp`.

Following the code in my last post, `result2[0].val()` returns `Inf`, but `result2[0].d_` returns `NaN`. I know that the issue only comes up with respect to `result2[0].d_.grad()`, but is this related to the fvar hack you’re talking about?

#16

I think it only makes sure the tangent is NaN if the value is NaN.

#17

I see. Then, I haven’t finished writing the tests, but if I can get the tests to pass for everything but the broadcasting cases, do you think that will be enough of a first step for binary vectorization? The tests all pass for `rev` and `fwd` so I feel that I’m close, but trying to understand this error has been holding back the whole thing for at least a year and it might be nice to have some vectorization functionality for binary functions.

#18

Sorry you’re stuck on this.

When you say enough, you mean not testing the `foo(real, vector)` and `foo(vector, real)` cases? I don’t think that’ll be enough. You could write something to test them separately from the other framework. Or I could try to jump in and help, but I’m totally swamped at the moment.

#19

I thought that it might be enough for a first step because unary vectorization didn’t involve broadcasting. Without broadcasting, binary vectorization would just be an extension of unary vectorization, only for functions with two arguments. However, I’m willing to keep digging to make broadcasting a part of the first step for binary vectorization.

#20

Oh, do you mean can you roll out the functions that just take pairs of vectors as arguments? That would be OK.