Help testing release candidate for std::complex

@philipm Thanks, is this using Eigen-AD or standard eigen?

Eigen-AD

Thanks, I’ll be keeping a lookout for the Eigen-AD repo when it’s published ;) In the meantime, I’m going to look at the NumTraits template and this file https://github.com/stan-dev/math/blob/f1e8cf7ec2cc28f4bfd84b5515aed58d5dab85c2/lib/eigen_3.3.3/Eigen/src/Core/GeneralProduct.h
to see if I can further modify what it’s doing.
General / side note: I got a new work laptop yesterday. I’m still setting up the build environment. There are extra restrictions on it vs. my old one that make it harder to post here (Chrome sync is disabled adminstratively, so no bookmarks or password manager… and I use auto-generated random passwords for everything, including github’s oauth…)

Right, you need to explicitly allow such operations. Eigen-AD added a dummy template parameter to ScalarBinaryOpTraits in order to specialize it conveniently using SFINAE.

As long as you provide the matching scalar overloads for your AD data type with complex, you should be good to go.

I did that in my summer 2018 implementation, and it was added in Bob’s above, but when we do something like Matrix<complex,-1,-1> * Matrix<double,-1,-1>, it fails to compile (note dynamic size – fixed sizes work). For dynamic size, if we use lazyProduct or if we explicitly cast the 2nd argument to Matrix<var,-1,-1> it works. Is there anything else you can think of that you may have modified, perhaps casting the non-AD matrix to AD first?

Thanks much for continuing to help out with this. Does your implementation using the built-in std::complex<T> implementations also have this problem?

I filed one of the original issues with Eigen to support mixed types around seven years ago, so we’ve been struggling with this for quite a while. We wound up just overloading everything in Stan for the autodiff types and using our own linear algebra internally. That’s wasteful in a lot of cases compared to using their expression templates, because we always create intermediates.

But when we add complex, the combinatorics get out of hand, with 28 overloads of combinations of primitives, autodiff variables, and complex:

(6) double * double, var, fvar<T>, complex<double>, complex<var>, complex<fvar<T>>

(4) var * double, var, complex<double>, complex<var>

(4) fvar<T> * double, fvar<T>, complex<double>, complex<fvar<T>> 

(6) complex<double> * double, var, fvar<T>, complex<double>, complex<var>, complex<fvar<T>>

(4) complex<var> * double, var, complex<double>, complex<var>

(4) complex<fvar<T>> * double, fvar<T>, complex<double>, complex<fvar<T>>

But even so, I’m not sure how we’d insert such overloads so that Eigen would use them for expression templates internally.

We have some Eigen traits definitions for Stan in:

stan/math/prim/mat/eigen_plugins.h
stan/math/fwd/mat/fun/Eigen_NumTraits.hpp
stan/math/rev/mat/fun/Eigen_NumTraits.hpp

Confirmed.

As @ChrisChiasson pointed out, that doesn’t seem to be enough no matter how we define the ScalarBinaryOpTraits. I think @ChrisChiasson tracked it down to an operation where with dynamic matrices, there is a cast of the second argument type to the first (or vice-versa) rather than casting up to the result type. When we have complex<double> and complex<var>, there’s no way to cast the second to the first. We don’t want to just literally support the cast because then we’d lose derivative information.

If I can’t modify the behavior to switch it to the correct complex handler via NumTraits or some other trick — [and it’s pretty obvious it is somehow not on the correct complex*non-complex path, otherwise it would not attempt to assign complex to double so deep in the algorithm… (getFactor I can understand, because they created a complex handler specifically for it that just wasn’t triggering in our case, but the errors after fixing that part indicate it is using something that isn’t aware of the non-complex right hand side)] —

So, one absolutely insane idea would be to try to create a template restricted overload to the “unary” (implicit this) operator* for MatrixBase so that the return type would no longer (necessarily) be a reference to the current matrix, but would instead (possibly) create a new matrix of the correct type in the specific cases we’re talking about here.

Sorry, was on vacation (and am trying to start my own business…)
Partial solution diff from Eigen 3.4 below. Allows left multiply of double * complex var with dynamic size matrices. Requires get_factor from above. Results are correct.

diff --git a/Eigen/src/Core/products/GeneralBlockPanelKernel.h b/Eigen/src/Core/products/GeneralBlockPanelKernel.h
index 2ae8ece98..9513179a2 100644
--- a/Eigen/src/Core/products/GeneralBlockPanelKernel.h
+++ b/Eigen/src/Core/products/GeneralBlockPanelKernel.h
@@ -513,23 +513,40 @@ public:
   }
 
   template<typename LhsPacketType, typename RhsPacketType, typename AccPacketType, typename LaneIdType>
-  EIGEN_STRONG_INLINE void madd(const LhsPacketType& a, const RhsPacketType& b, AccPacketType& c, RhsPacketType& tmp, const LaneIdType&) const
+  EIGEN_STRONG_INLINE
+  typename internal::enable_if<!internal::is_same<RhsPacketType, RhsPacketx4>::value, void>::type
+  madd(const LhsPacketType& a, const RhsPacketType& b, AccPacketType& c, AccPacketType& tmp, const LaneIdType&) const
   {
-    conj_helper<LhsPacketType,RhsPacketType,ConjLhs,ConjRhs> cj;
     // It would be a lot cleaner to call pmadd all the time. Unfortunately if we
     // let gcc allocate the register in which to store the result of the pmul
     // (in the case where there is no FMA) gcc fails to figure out how to avoid
     // spilling register.
 #ifdef EIGEN_HAS_SINGLE_INSTRUCTION_MADD
+    conj_helper<LhsPacketType,RhsPacketType,ConjLhs,ConjRhs> cj;
     EIGEN_UNUSED_VARIABLE(tmp);
     c = cj.pmadd(a,b,c);
 #else
+    conj_helper<LhsPacketType,AccPacketType,ConjLhs,ConjRhs> cj;
     tmp = b; tmp = cj.pmul(a,tmp); c = padd(c,tmp);
 #endif
   }
 
+  template<typename LhsPacketType, typename RhsPacketType, typename LaneIdType,
+   typename AccPacketType= typename ScalarBinaryOpTraits<LhsPacketType, RhsPacketType>::ReturnType>
+  EIGEN_STRONG_INLINE
+  typename internal::enable_if<!internal::is_same<RhsPacketType, RhsPacketx4>::value &&
+   !internal::is_convertible<AccPacketType, RhsPacketType>::value,void>::type
+  madd(const LhsPacketType& a, const RhsPacketType& b, AccPacketType& c, RhsPacketType&, const LaneIdType& lane) const
+  {
+   // What should really be done is that all callsites matching this template
+   // where B0 is passed as the argument for tmp should be changed so that T0
+   // is supplied instead, but I'm trying not to touch the rest of the code.
+   AccPacketType tmp;
+   madd(a, b, c, tmp, lane);
+  }
+
   template<typename LhsPacketType, typename AccPacketType, typename LaneIdType>
-  EIGEN_STRONG_INLINE void madd(const LhsPacketType& a, const RhsPacketx4& b, AccPacketType& c, RhsPacket& tmp, const LaneIdType& lane) const
+  EIGEN_STRONG_INLINE void madd(const LhsPacketType& a, const RhsPacketx4& b, AccPacketType& c, AccPacket& tmp, const LaneIdType& lane) const
   {
     madd(a, b.get(lane), c, tmp, lane);
   }
@@ -1409,7 +1426,7 @@ struct lhs_process_one_packet
         {
           EIGEN_ASM_COMMENT("begin gebp micro kernel 1/half/quarterX4");
           RhsPacketx4 rhs_panel;
-          RhsPacket T0;
+          AccPacket T0;
 
           internal::prefetch(blB+(48+0));
           peeled_kc_onestep(0, blA, blB, traits, &A0, &rhs_panel, &T0, &C0, &C1, &C2, &C3);
@@ -1436,7 +1453,7 @@ struct lhs_process_one_packet
         for(Index k=peeled_kc; k<depth; k++)
         {
           RhsPacketx4 rhs_panel;
-          RhsPacket T0;
+          AccPacket T0;
           peeled_kc_onestep(0, blA, blB, traits, &A0, &rhs_panel, &T0, &C0, &C1, &C2, &C3);
           blB += 4*RhsProgress;
           blA += LhsProgress;
@@ -1626,7 +1643,7 @@ void gebp_kernel<LhsScalar,RhsScalar,Index,DataMapper,mr,nr,ConjugateLhs,Conjuga
             EIGEN_ASM_COMMENT("begin gebp micro kernel 3pX4");
             // 15 registers are taken (12 for acc, 2 for lhs).
             RhsPanel15 rhs_panel;
-            RhsPacket T0;
+            AccPacket T0;
             LhsPacket A2;
             #if EIGEN_COMP_GNUC_STRICT && EIGEN_ARCH_ARM64 && defined(EIGEN_VECTORIZE_NEON) && !(EIGEN_GNUC_AT_LEAST(9,0))
             // see http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1633
@@ -1686,7 +1703,7 @@ void gebp_kernel<LhsScalar,RhsScalar,Index,DataMapper,mr,nr,ConjugateLhs,Conjuga
           for(Index k=peeled_kc; k<depth; k++)
           {
             RhsPanel15 rhs_panel;
-            RhsPacket T0;
+            AccPacket T0;
             LhsPacket A2;
             EIGEN_GEBP_ONESTEP(0);
             blB += 4*RhsProgress;
@@ -1869,7 +1886,7 @@ void gebp_kernel<LhsScalar,RhsScalar,Index,DataMapper,mr,nr,ConjugateLhs,Conjuga
           {
             EIGEN_ASM_COMMENT("begin gebp micro kernel 2pX4");
             RhsPacketx4 rhs_panel;
-            RhsPacket T0;
+            AccPacket T0;
 
           // NOTE: the begin/end asm comments below work around bug 935!
           // but they are not enough for gcc>=6 without FMA (bug 1637)
@@ -1916,7 +1933,7 @@ void gebp_kernel<LhsScalar,RhsScalar,Index,DataMapper,mr,nr,ConjugateLhs,Conjuga
           for(Index k=peeled_kc; k<depth; k++)
           {
             RhsPacketx4 rhs_panel;
-            RhsPacket T0;
+            AccPacket T0;
             EIGEN_GEBGP_ONESTEP(0);
             blB += 4*RhsProgress;
             blA += 2*Traits::LhsProgress;

Suggests several ways for a final solution:
Modify (or otherwise overload) gebp_traits’ madd more closely to just the stan complex types and their co-operands, or finish the more general modification shown here. One of the main impediments to further progress is disentangling the variables the program calls T0 and B0, B1, B2… The general case of the program treats T0 and B# as if they are type compatible, which is wrong in the case where T0 is complex and B# isn’t. T0 ends up as tmp in the madd call. B# ends up as b. In general, since T0 acts as temporary storage for a multiplication of A and B, then T must have the type AccPacketType. That is the change I implemented. The other modifications are just dealing with the side effects, because it causes function ambiguities once the madd arguments are corrected for T/tmp to use AccPacketType. For example, the reason why I don’t yet have dynamic size right multiply by double working is because fixing the T0 type at the call sites causes a problem with a function that is only used in that situation, and leaving out the T0 change for just that case causes a different problem.

Anyway, I’ll be back after the start of the year, but I have an urgent, hopefully short, project or two that I will be forced to knock out, so I can’t get back on this immediately… That’s why I’m posting incomplete results here, since it seems like enough information to crack the case.

Forgot to write why this is important. A lot of the problems in the present code come from the mixing of the purposes and types of the temporary variables and the variables from the double operand. In some cases, even within one algorithm’s usage of madd’s 4th argument, there are cases where B is provided, and then later T is provided. That’s no bueno if B and T are different.

If we can get a tight overload, possibly on the overall gebp_traits struct (via template specialization and inheritance of the more general template), we can provide an madd that discards its 4th argument, and set up something like a (possibly thread-local stan-threads define triggered) magic static to act as the tmp variable, which would not be much overhead considering that our complex types are involved. Then, we can make the 4th argument be a template argument that fits anything and doesn’t even give the variable a name, making it irrelevant whatever the calling code passes in.

edit 2019-12-31 since this is the 3rd reply and it won’t let me post a 4th:

I can elaborate further, but this has proven to be a pain, therefore: add madd_no_fma_helper (!37) · Merge requests · libeigen / eigen · GitLab.

Also, the get_factor from above needs to be spelled the following way to not create ambiguity with the normal version of the template for, say, complex double times double:

namespace Eigen::internal{
template<class T1,class T2>
struct get_factor<std::complex<stan::math::fvar<T1>>,T2>{
 EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T2
  run(std::complex<stan::math::fvar<T1>>const&x){
   return stan::math::val<stan::math::fvar<T1>,T2>(numext::real(x));
 }
};
template<class T2>
struct get_factor<std::complex<stan::math::var>,T2>{
 EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T2
  run(std::complex<stan::math::var>const&x){
   return stan::math::val<stan::math::var,T2>(numext::real(x));
 }
};
} // namespace Eigen::internal
1 Like

After reading your latest posts I just remembered that I had run into issues with madd before but this was not related to complex types but only plain mixing of the AD type with passives. I fixed it back then by introducing a new template parameter:

  template<typename LhsPacketType, typename RhsPacketType, typename LhsAccPacketType, typename RhsAccPacketType>
  EIGEN_STRONG_INLINE void madd(const LhsPacketType& a, const RhsPacketType& b, LhsAccPacketType& c, RhsAccPacketType& tmp) const
  {
    conj_helper<LhsPacketType,RhsPacketType,ConjLhs,ConjRhs> cj;
    // It would be a lot cleaner to call pmadd all the time. Unfortunately if we
    // let gcc allocate the register in which to store the result of the pmul
    // (in the case where there is no FMA) gcc fails to figure out how to avoid
    // spilling register.
#ifdef EIGEN_HAS_SINGLE_INSTRUCTION_MADD
    EIGEN_UNUSED_VARIABLE(tmp);
    c = cj.pmadd(a,b,c);
#else
    tmp = b; tmp = cj.pmul(a,tmp); c = padd(c,tmp);
#endif
  }

I didn’t mind to much about this change (in the context of “Does it make sense?”) as it fixed all my compilation issues and hasn’t caused any other misbehavior - at least judging from the test system results.

It appears that you are deeper into the topic, interesting stuff!

@Bob_Carpenter @ChrisChiasson I see this thread has been inactive for a while, so don’t know if this is something you are still actively working on, or whether discussion has moved to another thread?

In any case the Eigen-AD repository is now publicly available linked to from here.

I spent a bit of time trying to reproduce the issues you have been having. Initially I thought that swapping Eigen 3.3.3 for Eigen-AD resolves these issues, but now it seems that my little toy example works with both versions of Eigen. I.e. I can multiply a matrix of double variables with a vector of std::complex<stan::math::var> variables.

I would be interested to know whether there are still unresolved issues and whether you are planning to merge this feature branch into the development branch anytime soon?

Here is a complete example that works on my machine (with Eigen 3.3.3),

// Note that prototype for supporting complex arithmetic of Stan types is currently in the following branch
// https://github.com/stan-dev/math/tree/feature/0123-complex-var
#include <stan/math.hpp>

template<typename BinaryOp>
struct Eigen::ScalarBinaryOpTraits<std::complex<stan::math::var>, double, BinaryOp> {
  typedef std::complex<stan::math::var> ReturnType;
};

template<typename BinaryOp>
struct Eigen::ScalarBinaryOpTraits<double, std::complex<stan::math::var>, BinaryOp> {
  typedef std::complex<stan::math::var> ReturnType;
};

typedef Eigen::Matrix<stan::math::var, Eigen::Dynamic, Eigen::Dynamic> Stan_MT;
typedef Eigen::Matrix<std::complex<stan::math::var>, Eigen::Dynamic, Eigen::Dynamic> Stan_CMT;
typedef Eigen::Matrix<std::complex<stan::math::var>, Eigen::Dynamic, 1> Stan_CVT;

using namespace std;
using namespace Eigen;

int main(){
  try{
    Stan_MT A = Stan_MT::Random(6,6);
    cout << "Here is a random 6x6 matrix, A:" << endl << A << endl << endl;

    EigenSolver<Stan_MT> es(A);
    cout << "The eigenvalues of A are:" << endl << es.eigenvalues() << endl << endl;

    Stan_CMT V = es.eigenvectors(); // this line causes a compiler error
    cout << "The matrix of eigenvectors, V, is:" << endl << es.eigenvectors() << endl << endl;

    complex<stan::math::var> lambda = es.eigenvalues()[0];
    cout << "Consider the first eigenvalue, lambda = " << lambda << endl;

    Stan_CVT v = es.eigenvectors().col(0);
    cout << "If v is the corresponding eigenvector, then lambda * v = " << endl << lambda * v << endl;

    Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic> A_double = Eigen::MatrixXd::Random(6,6);
    Stan_CVT x1_cvt = A_double * v;

  } catch (const std::exception& e){
    std::cout << " a standard exception was caught, with message '"
              << e.what() << "'\n";
  }

  return 0;
}

Thanks for following up. This is still active. My branch is in the staging process and is just waiting for the next Stan math release to remove the logjam of pull requests. So I expect the activity is going to pick up in a week or two.

That looks great. Does it depend on Eigen-AD or just Eigen 3.3.3? And does it also work for multiplying Matrix<complex<double>, ...> and Marix<complex<var>, ...>?

It can depend on either Eigen 3.3.3. or Eigen-AD (they both work).

Yes I have now added this piece of code to my toy example,

    Matrix<complex<stan::math::var>, Dynamic, 1> v = es.eigenvectors().col(0);
    Matrix<complex<double>, Dynamic, Dynamic> A_complex = MatrixXcd::Random(6,6);
    Matrix<complex<stan::math::var>, Dynamic, 1> x2_cvt = A_complex * v;

And added the following ReturnType definitions,

template<typename BinaryOp>
struct Eigen::ScalarBinaryOpTraits<std::complex<stan::math::var>, std::complex<double>, BinaryOp> {
  typedef std::complex<stan::math::var> ReturnType;
};

template<typename BinaryOp>
struct Eigen::ScalarBinaryOpTraits<std::complex<double>, std::complex<stan::math::var>, BinaryOp> {
  typedef std::complex<stan::math::var> ReturnType;
};

And that works.

I am a bit surprised at this given the lengthy discussion above and how challenging it seemed. I will keep a watch out for more activity on this.

That’s really great. I had similar types declared.

I’m trying to resolve all the merge conflicts now with our develop branch. There’s been a lot of refactoring of directory structure, which I’ve dealt with on the include side, and enable-if, which is currently causing problems with ambiguity and pow. That shouldn’t take long to sort out.

I tried very similar overloads and could not get them to work. I’m going to go over exactly what’s different between your structs and the ones that I had currently commented out in the branch.

The 3rd line is multiplying complex double times complex var, rather than complex times non-complex.

Haven’t checked to see if this would fix the problem for stan, but I think it would be difficult given that all the parameters are reference type and the calling algorithms are sometimes supplying incompatible arguments (non-complex ones) for the temporary non-const lvalue that is supposed to store a complex result. (In other words, I think that even if tmp somehow acquired the correct complex var type rather than non-complex double, it would still have the problem that it is non-const lvalue, so nothing could be written to the memory to which tmp refers, because that memory sometimes holds just a plain double.)

I’m still waiting on Eigen to respond futher: add madd_no_fma_helper (!37) · Merge requests · libeigen / eigen · GitLab
Maybe people could add their voices to the merge request?

Failing that, we can just specialize the kinda large template of which madd is a member. (Note, it is ugly to do because of the macros that are defined and then undefined.)

I fixed the branch. And can verify that your template overloads worked! I’m not sure what I was doing wrong before—I think I may have had them in the wrong namespace. I needed to add a couple more. Here’s the final set of overloads for ScalarBinaryOpTraits:

namespace Eigen {

// only need patterns with at least one var and at least one complex
// see: https://discourse.mc-stan.org/t/help-testing-release-candidate-for-std-complex/12314/54

template <typename BinaryOp>
struct Eigen::ScalarBinaryOpTraits<double, std::complex<stan::math::var>,
                                   BinaryOp> {
  typedef std::complex<stan::math::var> ReturnType;
};

template <typename BinaryOp>
struct Eigen::ScalarBinaryOpTraits<stan::math::var,
                                   std::complex<double>, BinaryOp> {
  typedef std::complex<stan::math::var> ReturnType;
};

template <typename BinaryOp>
struct Eigen::ScalarBinaryOpTraits<std::complex<double>,
                                   std::complex<stan::math::var>, BinaryOp> {
  typedef std::complex<stan::math::var> ReturnType;
};

template <typename BinaryOp>
struct Eigen::ScalarBinaryOpTraits<std::complex<double>,
                                   stan::math::var, BinaryOp> {
  typedef std::complex<stan::math::var> ReturnType;
};

template <typename BinaryOp>
struct Eigen::ScalarBinaryOpTraits<std::complex<stan::math::var>, double,
                                   BinaryOp> {
  typedef std::complex<stan::math::var> ReturnType;
};

template <typename BinaryOp>
struct Eigen::ScalarBinaryOpTraits<std::complex<stan::math::var>,
                                   stan::math::var, BinaryOp> {
  typedef std::complex<stan::math::var> ReturnType;
};

template <typename BinaryOp>
struct Eigen::ScalarBinaryOpTraits<std::complex<stan::math::var>,
                                   std::complex<double>, BinaryOp> {
  typedef std::complex<stan::math::var> ReturnType;
};

}  // namespace Eigen

Here’s the test that all the combinations compile:

TEST(mathMix, multiplicationPatterns) {
  Eigen::MatrixXd d(2, 2);
  d << 1, 2, 3, 4;
  Eigen::Matrix<var_t, -1, -1> v(2, 2);
  v << 1, 2, 3, 4;
  Eigen::Matrix<cdouble_t, -1, -1> cd(2, 2);
  cd << 1, 2, 3, 4;
  Eigen::Matrix<cvar_t, -1, -1> cv(2, 2);
  cv << 1, 2, 3, 4;

  auto d_d = d * d;
  auto d_v = d * v;
  auto d_cd = d * cd;
  auto d_cv = d * cv;

  auto v_d = v * d;
  auto v_v = v * v;
  auto v_cd = v * cd;
  auto v_cv = v * cv;

  auto cd_d = cd * d;
  auto cd_v = cd * v;
  auto cd_cd = cd * cd;
  auto cd_cv = cd * cv;

  auto cv_d = cv * d;
  auto cv_v = cv * v;
  auto cv_cd = cv * cd;
  auto cv_cv = cv * cv;
}

Yatta!

2 Likes

Great work guys!

I was experiencing the problem with the dynamic sizes too, and my overloads are definitely in the correct namespace.

What about fvar? I need it for my work.

If I had to guess based on the observed differences in ScalarBinaryOpTraits from my large template block vs. @philipm’s style, it is possible that treating BinaryOp as a non template-template-parameter may indirectly prevent the problematic packet math template with madd from being invoked. Of course, this also makes it harder to construct the relevant generalized overloads (say, for fvar).

I wonder if it is possible to switch the template conditions in the original templated ScalarBinaryOpTraits so that all the conditions are inside BinaryOp, and so that the first occurrences of T1 and T2 are directly deducible from the first two arguments of ScalarBinaryOpTraits rather than inside the 3rd argument (BinaryOp). Also, I wonder if Eigen makes any checks on BinaryOp with more or less than 2 arguments in the packet math case, which would also cause problems.

Presumably we just need to add the corresponding fvar<T> type with an additional template parameter wherever we have var now. There’s no need to mix var and fvar—it doesn’t really make sense. Same way it doesn’t make sense to mix fvar<double> and fvar<fvar<double>> at the top level.