Refactoring math with more general signatures

stevebronder · August 13, 2019, 1:17pm

I’m working on adding sparse matrices to Stan. In math there are two ways to go about this.

Reformat the library so a lot of the general functions can take both dense and sparse matrices.
Add duplicate code that specializes on sparse matrices.

There has been a lot of discussion on the refactor for (1) in the past (see this Google groups post and this issue). Dan points out that Eigen coefficient access doesn’t work well with MatrixBase. Though Ben points out that A.coeff() for access instead of operator() would be sufficient to handle the coefficient access error Dan saw. There’s a few ways to generalize the code which I’ll cover below

TLDR; Using Matrix, MatrixBase, EigenBase, or auto almost always gives the same assembly output. We should just use the one that gives us the nicest looking code. If we are careful with auto we can and should use it for function signatures because it performs better on clang 6.

Eigen with `-O3`

The below will be the output of creating a returning a value from a function that calls A + B * A * B

We will use the following signatures

// Helper for return type
template <typename Derived>
using eigen_plain_t = typename Derived::PlainObject;

// Helper for Eigen Dynamic
template <typename T>
using eigen_mat_dyn = Eigen::Matrix<T, -1, -1>;

// With Standard Matrices
template <typename T1, typename T2>
inline eigen_mat_dyn<T1> add_mat(
    const eigen_mat_dyn<T1>& A,
    const eigen_mat_dyn<T2>& B) {
    eigen_mat_dyn<T1> C = A + B * A * B;
    return C;

}

// With Matrix Base
template <class Derived>
inline eigen_plain_t<Derived> add_mat_base(
    const Eigen::MatrixBase<Derived>& A,
    const Eigen::MatrixBase<Derived>& B) {
    auto C = A + B * A * B;
    return C;
}

// With Eigen Base
template <class Derived>
inline eigen_plain_t<Derived> add_base(
    const Eigen::EigenBase<Derived>& A_,
    const Eigen::EigenBase<Derived>& B_) {
   // We have to pull out the Eigen derivied types
    Derived const& A = A_.derived();
    Derived const& B = B_.derived();
    Derived C = A + B * A * B;
    return C;

}

// With auto
template <typename T1, typename T2>
inline auto add_decl_ret(const T1& A, const T2& B) {
    auto C = A + B * A * B;
    return C;
}

Eigen Funcs on GCC 4.9 and `-O3`

TLDR; All the above calls generate the same asm

Starting with the Eigen::CommaInitializer Every method uses pretty similar asm. Call .L1149, .L1150, .L1151, and then calling the internal scalar_sum_op templated by Eigen::PlainObjectBase

## Matrix
mov     rax, QWORD PTR [rsp+48]
cmp     rax, QWORD PTR [rsp+8]
jne     .L1149
mov     rdx, QWORD PTR [rsp+16]
cmp     rdx, QWORD PTR [rsp+40]
jne     .L1150
cmp     rax, rdx
mov     QWORD PTR [rsp+240], rsp
mov     QWORD PTR [rsp+248], rbx
mov     QWORD PTR [rsp+256], rsp
mov     QWORD PTR [rsp+264], rbx
jne     .L1151
lea     rsi, [rsp+240]
lea     rdi, [rsp+64]
call    Eigen::PlainObjectBase<Eigen::Matrix<double, -1, -1, 0, -1, -1> >::PlainObjectBase<Eigen::CwiseBinaryOp<Eigen::internal::scalar_sum_op<double, double>, Eigen::Matrix<double, -1, -1, 0, -1, -1> const, Eigen::Product<Eigen::Product<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0> const> >(Eigen::DenseBase<Eigen::CwiseBinaryOp<Eigen::internal::scalar_sum_op<double, double>, Eigen::Matrix<double, -1, -1, 0, -1, -1> const, Eigen::Product<Eigen::Product<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0> const> > const&)

## EigenBase
mov     rdx, QWORD PTR [rsp+48]
cmp     rdx, QWORD PTR [rsp+8]
jne     .L1149
mov     rax, QWORD PTR [rsp+16]
cmp     rax, QWORD PTR [rsp+40]
jne     .L1150
cmp     rdx, rax
mov     QWORD PTR [rsp+288], rsp
mov     QWORD PTR [rsp+296], rbx
mov     QWORD PTR [rsp+304], rsp
mov     QWORD PTR [rsp+312], rbx
jne     .L1151
lea     rsi, [rsp+288]
lea     rdi, [rsp+96]
call    Eigen::PlainObjectBase<Eigen::Matrix<double, -1, -1, 0, -1, -1> >::PlainObjectBase<Eigen::CwiseBinaryOp<Eigen::internal::scalar_sum_op<double, double>, Eigen::Matrix<double, -1, -1, 0, -1, -1> const, Eigen::Product<Eigen::Product<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0> const> >(Eigen::DenseBase<Eigen::CwiseBinaryOp<Eigen::internal::scalar_sum_op<double, double>, Eigen::Matrix<double, -1, -1, 0, -1, -1> const, Eigen::Product<Eigen::Product<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0> const> > const&)

## MatrixBase
mov     rdx, QWORD PTR [rsp+48]
cmp     rdx, QWORD PTR [rsp+8]
jne     .L1149
mov     rax, QWORD PTR [rsp+16]
cmp     rax, QWORD PTR [rsp+40]
jne     .L1150
cmp     rdx, rax
mov     QWORD PTR [rsp+336], rsp
mov     QWORD PTR [rsp+344], rbx
mov     QWORD PTR [rsp+352], rsp
mov     QWORD PTR [rsp+360], rbx
jne     .L1151
lea     rsi, [rsp+336]
lea     rdi, [rsp+128]
call    Eigen::PlainObjectBase<Eigen::Matrix<double, -1, -1, 0, -1, -1> >::PlainObjectBase<Eigen::CwiseBinaryOp<Eigen::internal::scalar_sum_op<double, double>, Eigen::Matrix<double, -1, -1, 0, -1, -1> const, Eigen::Product<Eigen::Product<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0> const> >(Eigen::DenseBase<Eigen::CwiseBinaryOp<Eigen::internal::scalar_sum_op<double, double>, Eigen::Matrix<double, -1, -1, 0, -1, -1> const, Eigen::Product<Eigen::Product<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0> const> > const&)

## Auto
mov     rax, QWORD PTR [rsp+48]
cmp     rax, QWORD PTR [rsp+8]
jne     .L1149
mov     rdx, QWORD PTR [rsp+16]
cmp     rdx, QWORD PTR [rsp+40]
jne     .L1150
cmp     rax, rdx
mov     QWORD PTR [rsp+192], rsp
mov     QWORD PTR [rsp+200], rbx
mov     QWORD PTR [rsp+208], rsp
mov     QWORD PTR [rsp+216], rbx
jne     .L1151
lea     rsi, [rsp+192]
lea     rdi, [rsp+160]
call    Eigen::PlainObjectBase<Eigen::Matrix<double, -1, -1, 0, -1, -1> >::PlainObjectBase<Eigen::CwiseBinaryOp<Eigen::internal::scalar_sum_op<double, double>, Eigen::Matrix<double, -1, -1, 0, -1, -1> const, Eigen::Product<Eigen::Product<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0> const> >(Eigen::DenseBase<Eigen::CwiseBinaryOp<Eigen::internal::scalar_sum_op<double, double>, Eigen::Matrix<double, -1, -1, 0, -1, -1> const, Eigen::Product<Eigen::Product<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0> const> > const&)

## Product With Addition
.L1151:
        mov     ecx, OFFSET FLAT:Eigen::CwiseBinaryOp<Eigen::internal::scalar_sum_op<double, double>, Eigen::Matrix<double, -1, -1, 0, -1, -1> const, Eigen::Product<Eigen::Product<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0> const>::CwiseBinaryOp(Eigen::Matrix<double, -1, -1, 0, -1, -1> const&, Eigen::Product<Eigen::Product<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0> const&, Eigen::internal::scalar_sum_op<double, double> const&)::__PRETTY_FUNCTION__
        mov     edx, 110
        mov     esi, OFFSET FLAT:.LC45
        mov     edi, OFFSET FLAT:.LC46
        call    __assert_fail
## Product Product
.L1150:
        mov     ecx, OFFSET FLAT:Eigen::Product<Eigen::Product<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0>::Product(Eigen::Product<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0> const&, Eigen::Matrix<double, -1, -1, 0, -1, -1> const&)::__PRETTY_FUNCTION__
        mov     edx, 97
        mov     esi, OFFSET FLAT:.LC4
        mov     edi, OFFSET FLAT:.LC5
        call    __assert_fail
## Product from a temporary
.L1149:
        call    Eigen::Product<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0> const Eigen::MatrixBase<Eigen::Matrix<double, -1, -1, 0, -1, -1> >::operator*<Eigen::Matrix<double, -1, -1, 0, -1, -1> >(Eigen::MatrixBase<Eigen::Matrix<double, -1, -1, 0, -1, -1> > const&) const [clone .part.30]
        mov     rdi, QWORD PTR [rsp+160]
        mov     rbx, rax
        call    free

So, essentially we can pick whatever we want as the signature here (excluding the instances where Eigen shows that auto can segfault) and for 4.9.3 we will get the same stuff.

On a side note, using a newer version of GCC gives the same asm for everything except for Matrix, which does requires two extra moves and an unpack. The other calls all give the same asm.

Eigen Funcs on Clang 6 and `-O3`

TLDR: All the asm is the same except for `auto` which is actually inlined and a bit more optmized

This is waaaaayy more complicated. But essentially everything is also the same but clang 6 loves to duplicate code. The only interesting thing below is that the auto version actually inlines while the others are forced to make a call.

## Matrix
Eigen::Matrix<double, -1, -1, 0, -1, -1> add_mat<double, double>(Eigen::Matrix<double, -1, -1, 0, -1, -1> const&, Eigen::Matrix<double, -1, -1, 0, -1, -1> const&): # @Eigen::Matrix<double, -1, -1, 0, -1, -1> add_mat<double, double>(Eigen::Matrix<double, -1, -1, 0, -1, -1> const&, Eigen::Matrix<double, -1, -1, 0, -1, -1> const&)
        push    r14
        push    rbx
        sub     rsp, 56
        mov     rax, rsi
        mov     rbx, rdi
        mov     rsi, qword ptr [rdx + 16]
        cmp     rsi, qword ptr [rax + 8]
## .LBB*_10 are the same product
        jne     .LBB1_10
        mov     rcx, qword ptr [rax + 16]
        cmp     rcx, qword ptr [rdx + 8]
## .LBB*_11 are all the same product
        jne     .LBB1_11
        mov     qword ptr [rsp + 16], rax
        mov     qword ptr [rsp + 24], rdx
        mov     qword ptr [rsp + 32], rax
        mov     qword ptr [rsp + 40], rdx
        cmp     rsi, rcx
## .LBB*_12 are all the same addition
        jne     .LBB1_12
        xorps   xmm0, xmm0
        movups  xmmword ptr [rbx], xmm0
        mov     qword ptr [rbx + 16], 0
        test    rsi, rsi
## .LBB*_7 are all the same product
        je      .LBB1_7
        movabs  rax, 9223372036854775807
        xor     edx, edx
        idiv    rsi
        cmp     rax, rsi
        jl      .LBB1_5

## EigenBase
Eigen::Matrix<double, -1, -1, 0, -1, -1>::PlainObject add_base<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1> >(Eigen::EigenBase<Eigen::Matrix<double, -1, -1, 0, -1, -1> > const&, Eigen::EigenBase<Eigen::Matrix<double, -1, -1, 0, -1, -1> > const&): # @Eigen::Matrix<double, -1, -1, 0, -1, -1>::PlainObject add_base<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1> >(Eigen::EigenBase<Eigen::Matrix<double, -1, -1, 0, -1, -1> > const&, Eigen::EigenBase<Eigen::Matrix<double, -1, -1, 0, -1, -1> > const&)
        push    r14
        push    rbx
        sub     rsp, 56
        mov     rax, rsi
        mov     rbx, rdi
        mov     rsi, qword ptr [rdx + 16]
        cmp     rsi, qword ptr [rax + 8]
## .LBB*_10 are the same product
        jne     .LBB2_10
        mov     rcx, qword ptr [rax + 16]
        cmp     rcx, qword ptr [rdx + 8]
## .LBB*_11 are all the same product
        jne     .LBB2_11
        mov     qword ptr [rsp + 16], rax
        mov     qword ptr [rsp + 24], rdx
        mov     qword ptr [rsp + 32], rax
        mov     qword ptr [rsp + 40], rdx
        cmp     rsi, rcx
## .LBB*_12 are all the same addition
        jne     .LBB2_12
        xorps   xmm0, xmm0
        movups  xmmword ptr [rbx], xmm0
        mov     qword ptr [rbx + 16], 0
        test    rsi, rsi
  ## .LBB*_7 are all the same product
        je      .LBB2_7
        movabs  rax, 9223372036854775807
        xor     edx, edx
        idiv    rsi
        cmp     rax, rsi
        jl      .LBB2_5

## MatrixBase
Eigen::Matrix<double, -1, -1, 0, -1, -1>::PlainObject add_mat_base<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1> >(Eigen::MatrixBase<Eigen::Matrix<double, -1, -1, 0, -1, -1> > const&, Eigen::MatrixBase<Eigen::Matrix<double, -1, -1, 0, -1, -1> > const&): # @Eigen::Matrix<double, -1, -1, 0, -1, -1>::PlainObject add_mat_base<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1> >(Eigen::MatrixBase<Eigen::Matrix<double, -1, -1, 0, -1, -1> > const&, Eigen::MatrixBase<Eigen::Matrix<double, -1, -1, 0, -1, -1> > const&)
        push    r14
        push    rbx
        sub     rsp, 56
        mov     rax, rsi
        mov     rbx, rdi
        mov     rsi, qword ptr [rdx + 16]
        cmp     rsi, qword ptr [rax + 8]
## .LBB*_10 are the same product
        jne     .LBB3_10
        mov     rcx, qword ptr [rax + 16]
        cmp     rcx, qword ptr [rdx + 8]
## .LBB*_11 are all the same product
        jne     .LBB3_11
        mov     qword ptr [rsp + 16], rax
        mov     qword ptr [rsp + 24], rdx
        mov     qword ptr [rsp + 32], rax
        mov     qword ptr [rsp + 40], rdx
        cmp     rsi, rcx
## .LBB*_12 are all the same addition
        jne     .LBB3_12
        xorps   xmm0, xmm0
        movups  xmmword ptr [rbx], xmm0
        mov     qword ptr [rbx + 16], 0
        test    rsi, rsi
  ## .LBB*_7 are all the same product
        je      .LBB3_7
        movabs  rax, 9223372036854775807
        xor     edx, edx
        idiv    rsi
        cmp     rax, rsi
        jl      .LBB3_5

## Auto is the only one that's inlined in main!?!?
mov     rsi, qword ptr [rsp + 32]
 cmp     rsi, qword ptr [rsp + 56]
# Product
 jne     .LBB0_76
 mov     rax, qword ptr [rsp + 64]
 cmp     rax, qword ptr [rsp + 24]
# Product
 jne     .LBB0_77
 cmp     rsi, rax
# Addition
 jne     .LBB0_78
 mov     qword ptr [rsp + 184], rbx
 mov     qword ptr [rsp + 192], r14
 mov     qword ptr [rsp + 200], rbx
 mov     qword ptr [rsp + 208], r14
 xorps   xmm0, xmm0
 movaps  xmmword ptr [rsp + 80], xmm0
 mov     qword ptr [rsp + 96], 0
 test    rsi, rsi
# Product
 je      .LBB0_46

Conclusions

I think auto and generic templates are a good idea for a lot of the functions in the math library where the functions are simple. I.E. using

template <typename T1, typename T2, typename = enable_if_all_eigen<T1, T2>>
auto add(const T1&, const T2&) {
  return A + B;
}

is fine. For the simpler things like addition, subtraction, and multiplication this will let those work for sparse and dense matrices. This will give us the same performance on gcc with slightly better performance on clang. Though we should probs not use auto for variable declarations.

rok_cesnovar · August 13, 2019, 5:39pm

The auto code looks really nice. Maybe make a WIP PR branch and replace a bunch of them and lets see if there is more there?

Just a general question, what is the benefit of using EigenBase compared to MatrixBase? I understand that by uisng MatrixBase we can use the same function with Eigen::MatrixXd and Eigen::Map, etc
But what do we gain with EigenBase? I am an idiot, its probably Eigen::Vector…

stevebronder · August 14, 2019, 9:07am

See the PR here!

github.com/stan-dev/math

[WIP] Use auto and more general template types

stan-dev:develop ← bstatcomp:code-cleanup/auto

opened 08:59AM - 14 Aug 19 UTC

SteveBronder

+1571 -1295

## Summary As described in [this](https://discourse.mc-stan.org/t/refactoring…-math-with-more-general-signatures/10157) discourse thread I think we can get away with using auto in the return types for simple functions like `add`, `subtract`, and `multiply`. But to do that accept general Eigen types I had to do a few things 1. For simple functions like add, subtract, multiply, we use a signature that looks like ```cpp template <typename T1, typename T2, int R, int C> inline Eigen::Matrix<return_type_t<T1, T2>, R, C> add( const Eigen::Matrix<T1, R, C>& m1, const Eigen::Matrix<T2, R, C>& m2) { ``` We now use ```cpp template <typename Mat1, typename Mat2, typename = enable_if_all_eigen<Mat1, Mat2>> inline auto add(const Mat1& m1, const Mat2& m2) { ``` This is nice because `enable_if_all_eigen` checks that the template type inherits from `EigenBase`, which means that Mat1 and Mat2 can be types like `Eigen::Product<...>` or `Eigen::Diagonal<...>`. In order to differ to the correct function a large number of `enable_if_*` methods have been made available. 2. In order to make sure I'm pulling out the types and sub types correctly I had to update the metaprogramming structs so that the eigen based meta programming methods work with more than just eigen matrices. In general I tried to go through them and use things like `std::decay_t` so we don't need specializations for const etc. 3. By cleaning up the metaprogramming we can also have the `apply_scalar_unary` also support any type of Eigen matrix. This should remove a lot of copies and let us use Sparse matrices in the future. 4. Most of the functions in `fwd` are really specializations for Eigen matrices with `fvar` types. With the use of this refactor we can remove a lot of those methods in fwd and default to the prim implementations. There are only one or two methods which lose precision (to the level of 1e-15) and I'll doc/note those in this PR so we can talk about whether it's worth having another method and more precision or just using the default methods in prim. (on a side note, maybe not for this PR but it may be nice to change the name of `prim` to `generic` since that's the real intent of that folder) ## Tests I think the original tests should be fine? I would like to also add tests for arithmetic sparse matrices since that's the intent of this PR ## Side Effects Not sure if it's a side effect, but these methods will now work for all classes that inherit from `EigenBase`. ## Checklist - [x] Math issue #1308 - [x] Copyright holder: Steve Bronder - [x] the basic tests are passing - unit tests pass (to run, use: `./runTests.py test/unit`) - header checks pass, (`make test-headers`) - docs build, (`make doxygen`) - code passes the built in [C++ standards](https://github.com/stan-dev/stan/wiki/Code-Quality) checks (`make cpplint`) - [x] the code is written in idiomatic C++ and changes are documented in the doxygen - [x] the new changes are tested

But what do we gain with EigenBase ? I am an idiot, its probably Eigen::Vector…

So in the Eigen docs here they say to use EigenBase to avoid making temporaries, but the above stuff shows that compilers treat EigenBase, MatrixBase, and Matrix argument types the same**

**In gcc 9.1 using EigenBase or MatrixBase does avoid 1 move and 1 unpack operation wrt using Matrix

rok_cesnovar · August 14, 2019, 12:05pm

Great, lets discuss this on tomorrows Math meeting, seems like a refactor that could potentially benefit everyone, not just for sparse matrices.

syclik · August 14, 2019, 3:43pm

I’m all for this! We couldn’t really do this when we were C++03 because of the combinatorial explosion of function signatures, but we can now with auto!!!

We still have to be careful. I’m not certain that we’re going to get the correct results back out of Eigen. Unfortunately, all the different views / types do have different implementations. This caught us out before, but hopefully we can make some progress this time around. This would be so awesome for speed too if we’re able to propagate the expression templates forward.

Yes, let’s talk about it tomorrow!

Bob_Carpenter · August 20, 2019, 8:48pm

We should be moving from operator() and operator[] (just vectors) to .coeff. It doesn’t check ranges, but we do that everywhere ourselves, so the checks are (largely? totally?)

Cool! Presumably it’s not the same code with Matrix and MatrixBase if you pass it an expression template; I’d have thought Matrix would .eval() and the base not.

It can be problematic as they note in the Eigen doc if you wind up saving something unevaluated rather than copied as it can imply a lot more eval downstream.

add can be a bit more general in that T1 or T2 can be scalars.

tadej · October 25, 2019, 8:45am

@stevebronder did you hit any problems with this? If you don’t have the time for this I could tackle it. Regardless of sparse matrix support I think this should improve performance of existing functionalities.

stevebronder · October 25, 2019, 3:45pm

No problems but did lose track of this. Help would def be appreciated! I have a PR I need to clean up in the stan repo that uses a bunch of this stuff

Topic		Replies	Views
Stan-math: Eigen support General stan-math	13	1832	July 4, 2017
Math library input checks Developers	53	2652	March 7, 2019
Can't "multiply" Eigen::Tensors General stan-math	2	2299	August 6, 2017
Stan::math's quad_form and Eigen matrix multiplication General stan-math	4	770	March 4, 2023
Eigen 3.4 notes Developers math	1	915	October 31, 2019

Refactoring math with more general signatures

Eigen with -O3

Eigen Funcs on GCC 4.9 and -O3

TLDR; All the above calls generate the same asm

Eigen Funcs on Clang 6 and -O3

TLDR: All the asm is the same except for auto which is actually inlined and a bit more optmized

Conclusions

Related topics

Eigen with `-O3`

Eigen Funcs on GCC 4.9 and `-O3`

Eigen Funcs on Clang 6 and `-O3`

TLDR: All the asm is the same except for `auto` which is actually inlined and a bit more optmized