Vectorised AD testing - Nested containers

andrjohns · December 17, 2019, 2:38am

I’ve been working through generalising the (unary) vector functions to take all vector types as inputs, and the new ad testing framework has been a huge time saver. The only problem I’m running into is with testing functions taking nested containers as inputs (e.g., log_sum_exp(std::vector<Eigen::VectorXd>). There is a test_ad_vectorised function, but that’s for the vectorized scalar functions (exp, etc).

Is there a way to test these nested container inputs in the ad framework or is that not possible? If not, what would be the best workaround here? Individual rev/fwd tests for these inputs?

Tagging @Bob_Carpenter (but open to help from all!)

Bob_Carpenter · December 20, 2019, 12:01am

I’m not quite sure what this means. The example with log_sum_exp you are applying it elementwise to a standard vector of vectors.

The general test framework deals with arbitrary types, but the automatic vectorization testing framework only deals with the existing vectorization.

The best thing to do would be to extend the existing vectorization to deal with the new types rather than writing a parallel framework and also extend the testing framework. I believe @stevebronder has done some preliminary work on generalizing the vectorization to handle complex functions.

The test functions would have to be extended the same way. Or you could just write individual tests of them.

There was some discussion from @seantalts around stanc3 about code-generating these vectorizations rather than writing them in C++ but I have no idea if that idea’s still live.

andrjohns · December 20, 2019, 1:46pm

I’m not quite sure what this means.

I talked about it in more depth in the design doc over here

The best thing to do would be to extend the existing vectorization to deal with the new types rather than writing a parallel framework and also extend the testing framework. I believe @stevebronder has done some preliminary work on generalizing the vectorization to handle complex functions.

That makes sense to me, I’ll have a look into extending things to work with the new functionality. Thanks!

Bob_Carpenter · December 26, 2019, 8:07pm

Thanks for the link to the actual proposal.

This is a bit different than what I’ve called “vectorization”, which is itself different than what the CPU/compiler calls vectorization. By “vectorization” in Stan, we have two ambiguous meanings:

applying a scalar function elementwise; this can apply to any old functions, like generalizing real sqrt(real) to vector sqrt(vector) or generalizing vector softmax(vector) to vector[] softmax(vector[]).
allowing scalar functions to be generalized to containers in reductions; this is something like normal_lpdf(real | real, real) being extended to normal_lpdf(reals | reals, reals) where any of the arguments can be real, real[], vector, or row_vector.

The original distinction among std::vector, Eigen::VectorXd, and Eigen::RowVectorXd was intentional, not the result of laziness. I wanted to avoid the confusion you get in R about the type of a linear algebra result and allow minimal type returns for functions like real multiply(row_vector, vector); I wanted arrays to be use for holding elements in a container and linear algebra types to be used whenever there was a proper multivariate function (like in softmax or log_sum_exp, which do not operate elementwise). That got relaxed as I seemed to be the only person happy with that distinction.

There’s even been calls to eliminate real[] from the language. I like the generality of saying our basic types are real, int, vector, row_vector, matrix and that we can form arbitrary arrays. It’s very simple to explain and document, though apparently very confusing to a lot of Stan users coming from less strictly typed languages like R or Python.

We are never going to be able to get to where vector, row_vector, and real[] are exchangeable, nor do I think we should aim for that.

This approach assumes that the combination of Eigen::Map + Eigen functions is faster than using standard library functions.

Why would Eigen::Matrix<T, R, C> containers be faster than std::vector? They’re built with the same RAII pattern, though initialization’s a bit different. I don’t know if the compiler designers optimze the standard library beyond what’s possible for Eigen as an external lib. Both Eigen::Matrix and std::vector guarantee memory locality on the values. Is there an example somewhere? I would think rather than using Eigen’s templated operator-, we’d want to get Stan’s version in place that uses custom gradients and can remove a bunch of virtual function calls using a single callback chain() method for all entries.

Eigen::Matrix<T, -1, -1> is different than std::vector<std::vector<T>> in that the matrix uses a single array of contiguous memory and stores in column-major order. That means accessing rows of vector<vector<T>> is much much faster because no copying is required. Traversing column-major will be faster with matrices.

The bigger question I have is what’s the return type of something like softmax()? It sounds like the proposal uses a uniform return type, which in Stan works out to these three types:

vector softmax(vector);
vector softmax(row_vector)
vector softmax(real[]);

If we do extend these functions, wouldn’t it be more natural to have each return its own type?

For head, we absolutely need to keep our input and output types, so that’d have to be:

vector head(vector, int);
row_vector head(row_vector, int);
real[] head(real[], int);
int[] head(int[], int);

Given the proposed definition of log_softmax, it’ll still only apply to Eigen types as they’re the only ones that support an appropriate .array() method. The document says that Eigen::Map will be used around std::vector, but I don’t see where that happens in the code examples.

Adding all these lambdas isn’t free. Their constructors and destructors will get called and assignments will be made. So I would suggest profiling the resulting functions, which will probably be dominated by exp and log anyway.

andrjohns · December 27, 2019, 3:39am

Hi Bob, thanks for the feedback!

I wanted arrays to be use for holding elements in a container and linear algebra types to be used whenever there was a proper multivariate function (like in softmax or log_sum_exp , which do not operate elementwise). That got relaxed as I seemed to be the only person happy with that distinction.

I see where you’re coming from, but unfortunately the codebase is in a bit of a limbo state where some math functions are implemented for arrays (e.g. exp, log_sum_exp), but not others (e.g. log_softmax). So for consistency, we need to either extend these functions for arrays (in the cases where we don’t need to assume that the array is a column or row vector), or to not have math functions defined for these at all.

Why would Eigen::Matrix<T, R, C> containers be faster than std::vector ?

This is less about the speed of accessing elements and more about the mathematical functions themselves, since Eigen employs SIMD vectorisation and lazy evaluation for its functions. These kinds of performance differences are being seen over in this pull, where Eigen’s vector functions are out-performing the existing element-wise functions.

The bigger question I have is what’s the return type of something like softmax()? It sounds like the proposal uses a uniform return type

The proposed framework returns the same type as the input, which looks like:

vector softmax(vector);
row_vector softmax(row_vector);
real[] softmax(real[]);

For head , we absolutely need to keep our input and output types

That’s how it’s currently implemented

Given the proposed definition of log_softmax , it’ll still only apply to Eigen types as they’re the only ones that support an appropriate .array() method. The document says that Eigen::Map will be used around std::vector , but I don’t see where that happens in the code examples.

The Eigen::Map happens in the apply_vector_unary struct. So the function itself only needs to be defined for Eigen types and std::vectors are automatically mapped before the function is applied.

Adding all these lambdas isn’t free. Their constructors and destructors will get called and assignments will be made. So I would suggest profiling the resulting functions, which will probably be dominated by exp and log anyway.

That’s a good idea, I’ll have a look into the performance testing side of things.

andrjohns · December 27, 2019, 6:31am

I’ve run some basic performance tests comparing the performance of log_sum_exp with eigen column vectors and std::vectors between the old and new implementations:

Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 8192 KiB (x1)
Load Average: 1.18, 1.08, 1.29
----------------------------------------------------------
Benchmark                Time             CPU   Iterations
----------------------------------------------------------
LogSumExpArrNew       2410 ns         2410 ns       288157
LogSumExpArrOld      10434 ns        10434 ns        67087
LogSumExpMatNew       2349 ns         2349 ns       293460
LogSumExpMatOld       2420 ns         2420 ns       293488

There is a speedup in the performance with std::vectors, and a negligible difference in the performance with Eigen column vectors. So (broadly speaking), it looks like the new framework (lambdas and such) doesn’t degrade current performance

andrjohns · December 29, 2019, 7:53am

I’ve opened up a PR with the proposed framework and examples for use with log_sum_exp, log_softmax, and head so that you can see the implementation in a bit more detail.

Bob_Carpenter · January 5, 2020, 3:25pm

We don’t need consistency at this level. Just because we added something once doesn’t mean we have to add everything else like it in the future.

I’ve objected all along when people wanted to define functions on arrays like log-sum-exp. I’d be happy to deprecate those.

Doesn’t that only kick in when T = double?

We should rewrite our existing elementwise functions using Eigen assuming we can get the same precision. Then we can call those instead, which gives us more flexibility in the future for better autodiff when T != double.

Also, if we wind up wrapping everything in a Map for generality, won’t the efficiency be the same, at least for R = -1, C = 1 and R = 1, C = -1?

Topic		Replies	Views
Nested ad function Developers	8	524	September 5, 2022
Nested AD with references to the top of the AD stack Developers	15	918	November 29, 2018
Remove tests for row vectors (from the distribution tests within the math library) Developers maintenance	25	1645	November 2, 2017
Stan Test Structure Developers	1	507	November 2, 2017
Gradient/etc... with parameter packs Developers	6	575	April 16, 2018

Vectorised AD testing - Nested containers

Related topics