Complex number support in Math library autodiff


#21

I am yet to be convinced there is sufficient need to expose complex numbers. Maybe some exposed functions can return a tuple with real and imaginary parts (perhaps not in the transformed parameters or model blocks), but I think most useful things can be implemented internally in Stan Math.


#22

I am in agreement - unless someone implements a full complex algebra everything will be piecemeal implementations that are more appropriately wrapped in the function for which the partial complex algebra implementation is needed. In that case we should be able to get away with functions that return real outputs (i.e. power and phase functions from an FFT, magnitude and phase functions for input complex numbers, etc) instead of a partial complex algebra implementation that is going to be extraordinarily confusing to users.


#23

On the original question, when I asked for this, I just wanted the largest eigenvalue for certain nonsymmetric matrices. This was actually for cases where you can guarantee they are real, but it seems you cannot compute them without complex numbers.


#27

Thanks for putting this together. This is definitely going to go beyond my analysis competence, so someone else should review it.

Spec for what?

The summary for the PR should be more specific as to what “initial attempt” means and what the scope of the PR is. The cited issue is just a bug report about a particular function. Is the PR to fix that function or is it more general? Will the behavior of other functions change?

We don’t want to start merging works in progress until they provide useful functionality.


#31

We don’t have any official standards yet for specs and we don’t want to double review things, so I’ll defer to @seantalts and @bgoodri who seem to understand the scope of this project.

I’m curious as to whether this is going to lead to new functions in Stan or different behavior of old functions. @bgoodri suggested during the meeting that the goal isn’t a complex type in Stan, but just support for some Eigen algorithms that use complex numbers under the hood.


#32

Specifically, in order to use complex numbers in the Stan language, it would presumably have to be the case that you could pass a complex number to any function you could pass a real number to, which is probably a long ways off. But I think if complex numbers could be used in Stan Math, and Eigen does not crash when it uses complex arithmetic internally, and everything for complex numbers in the C++14 standard worked
https://en.cppreference.com/w/cpp/numeric/complex
it would be worth merging.


#35

I don’t think we have anything exposed in the Stan language that works on doubles but not vars. There are now some “data only” arguments to ODE integrators. There are some Bessel functions that are only defined for integer orders that should be generalized to real numbers.

I am not sure what we may have in Stan Math that is double only but that is fine. I agree we can progressively add more functions that use complex numbers to Stan Math. I just thought that if we implement what the C++ standard defines, then we are presumably safe from Eigen using something that we lack an implementation of.


#36

That’s correct. Everything in the Stan language works for both things declared as data and as parameters. We have tests to verify that it is parsed, generates c++, and that it compiles.

Almost everything in the Stan language lines up directly with a math function.


#37

My confusion is that I don’t see a proposed scope anywhere.

I think that would depend on whether they have specializations of complex<T> anywhere that aren’t covered by the more general cases.

Implementing the scalar traits class from Eigen is helpful for efficiency. We did that for var arguments.


#38

@ChrisChiasson 's scope is stated in the PR

Intended Effect:
Complex numbers work. For me, this means that it satisfies what I needed to do with complex numbers, which I described (after being prompted below) as:
Conceptually, and ignoring details, it is roughly equivalent to this: I have some tables of complex numbers from which I am indirectly generating linear combinations of some of the columns. Later, I am interpolating (via circular lerp) the resulting combination columns and integrating a function of the amplitudes. I have some checks for my integrations, and I will use multidimensional newton’s method (requiring gradient and hessian) to minimize the error norm between the checks and the integrations as a function of the six coefficients of the aforementioned linear combinations and see if the result makes physical sense. My intention is to implement enough to accomplish this.
It also means that the PR passes the tests I have added for forward, reverse, and the special mixed mode test that uses newton’s method and involves checking (the definition of) eigendecomposition of a rotation matrix while converging to the point where the merit function is minimized because the real and imaginary parts of the first returned eigenvalue are equal.

Which is fine, but I don’t think that Newton thing, by itself, is of interest to (m)any additional people. So, the question is whether there is enough foundational stuff that is of interest to other people? I think there is because it essentially extends Stan Math’s autodiff framework to complex numbers where the real and imaginary parts are var or I guess fvar<var>. I think what @betanalpha was asking for — a complete complex algebra — is basically in the PR (+, -, *, / plus the extraction and special functions defined in the C++14 standard).

But I don’t see how pushing complex numbers into the Stan language could possibly be in scope in the near future. It would be a ton of work, and I don’t think @ChrisChiasson needs all that, so we would have to find someone else willing and capable to do a PR for Stan Library.

My scope is that we should be able to write Stan Math functions that take var matrices, call Eigen functions that use complex arithmetic internally, and return var output (maybe separating into real and imaginary parts). That is a bit more than what @ChrisChiasson wants to do personally, but it not much for me to add if we have implementations of all the functions Eigen is calling on complex numbers.

There is a compilation error related to the copysign function that I don’t understand, and not great test coverage. I am pretty sure it can’t break any double or var things though.


#41

Right, what we are calling “Stan Math” is what you mean by the C++. The vast majority of the functions in the prim and rev subdirectories are exposed in the Stan Language. There may have been some confusion as to whether your ultimate goal was to be able to write in the Stan language something like

data {
  int N;
  int K;
  complex_matrix[N, K] X;
  complex_vector[N] y;
}
parameters {
  vector[K] theta;
}
model {
  target += some_complex_lpdf(y | X * theta);
  ...
}

My interpretation is that your goal was only to be able to write autodifferentiable stuff in C++ by hand, rather than autogenerating the C++ via the Stan parser. That is much more feasible to merge.


#43

Thanks! That’s exactly what I was missing.

Presumably there’ll be a list of supported Eigen functions. Or will the assumption being they’ll all work, and we’ll just try to patch as we find new functions that don’t work?

Will this extend the domain of any functions we already have? Will it let us add new functions?

Nobody is expecting that—it’d be a ton of work and not necessary at all for this goal.


#44

So far, Eigen things are working fine with autodiff. I guess Eigen wrapper functions can be added to Stan Math one at a time (after this PR goes in) rather than listing them in advance. FFT and pseudo-eigendecomposition would probably be the first couple.


#45

That’d be great—there seems to be some pent-up demand for FFT—at least judging from the last time we brought it up.


#46

Thanks for implementing this. My response will be in two parts: the first as the maintainer of the math library. The second as me.

As maintainer of the math library

I’ll just drop in some bullet points.

  • when things get merged into develop, the community of developers is responsible for that code
  • prototypes are cool, but as long as they’re prototypes, they can live on branches or forks. They won’t get merged until they’re ready.
  • although the main client is the Stan language, we can add things to the C++ in the math library that won’t get exposed to the Stan language.
  • if this were to get merged, I would like to see someone else sign off on it stating they understand the implementation details well enough to hunt bugs through the implementation.

As me, personally

I don’t know how to compute with complex numbers so well, so I trust the reviews of my colleagues. More bullet points.

  • I’d like to review naming for consistency.
  • Are derivatives not taken with respect to the imaginary component?
  • Some of the doxygen doc doesn’t seem correct; or maybe it needs more explaining. Especially around ADL.
  • Thanks! I skimmed it and it looks reasonable. Although, I’d need more time to track through the template metaprogramming. I still haven’t gotten around how zeroing works.
  • Did you review the rules for implementing things in the std namespace? Just checking.
  • Since var.hpp changed, I’d really like to see difference in performance to something that’s completely unrelated to this pull request. If this has any negative impact on the runtime for all models, we’d want know and find another way to code it.

I think that’s it for now. Thanks for putting it together.


#49

Sorry about that – I meant fvar.hpp.

They’re terms I haven’t seen floating around before. If they’re common in the C++ template world, that’s great and I just haven’t seen them. But if they’re not, it’d really help using terminology that could be understood.

I hope so? Hard to help without knowing what the error is.

Last time we did this, we looked at this pretty extensively: https://en.cppreference.com/w/cpp/language/extending_std

That’s exactly what I thought you were doing. What I don’t understand yet is the exact use case for complex variables.

What I’m trying to ask is… do we need the gradient with respect to some set of variables for just the real component or the imaginary component and never both? Or is the use case to take the gradient with respect to the same set of variables for both components?

If it’s the second, then I think this design falls short. In order to get the gradients with respect to both the real and imaginary components, you’d have to do a forward and backwards sweep for both the real and the imaginary components. That’s fine, but if that’s what you’re trying to do, then I think we need to make that operation easier than having the user have to compute the exact same complex number twice and clear out the autodiff stack appropriately.

If it’s the first, then we’re good. I just don’t know enough about the use cases and am relying on you, @bgoodri, and whoever are experts here to tell us how they’d be used.


#50

@ChrisChiasson, just responding to your comments on github here where it’s a little easier to have a discussion.

ADL is common in C++. It is what makes unprefixed functions (like pow), and thus, all operators (like + or <<), call the right overloads. It works by bringing functions in the same namespace as the function or template parameter lists into the overload resolution set.

Yes, understood. ADL is common, but I don’t understand this phrase: “forwarding for ADL.” That’s all that was in the comment, nothing more, nothing less: // forwarding for ADL.

I will try to answer the rest later. Your questiin about what derivatives are taken does not make any sense to me. Can you try writing the minimum math equation that would demonstrate your point? Try to expand all the complex variables i the equation into real + i * real form so I can understand.

It feels like a chicken and egg problem to me. There isn’t a spec, so I don’t even know what I’m asking. But I’ll try.

I’ll just be pretty hand-wavy about it all.

Suppose the real component is computed using f(\theta, x) and the imaginary component is computed using g(\theta, x). So the complex variable that’s output is now f(\theta, x) + i * g(\theta, x).

Question:

  • will you only ever need \frac{df(\theta, x)}{d\theta_1}, \ldots, \frac{df(\theta, x)}{d\theta_N}?
  • will you only ever need \frac{dg(\theta, x)}{d\theta_1}, \ldots, \frac{dg(\theta, x)}{d\theta_N}?
  • are there any situations where you’ll need both \frac{df(\theta, x)}{d\theta_1}, \ldots, \frac{df(\theta, x)}{d\theta_N} and \frac{dg(\theta, x)}{d\theta_1}, \ldots, \frac{dg(\theta, x)}{d\theta_N}?
  • is this question nonsensical because when you need gradients, you’re only ever looking at operations that result in real components only?

I’m sure you can construct a std::complex<stan::math::var> such that you can add, subtract, do other operations on it. That’s what the pull request looks like. What’s not clear is if you had this complex variable with a real stan::math::var and an imaginary stan::math::var, what are you supposed to do with it? Your tests only took the gradients with respect to the real part. In a comment you mentioned you could easily take a gradient with respect to the imaginary part. I get that. But the autodiff isn’t set up to do both automatically. There really isn’t a way to do it easily. (It can be done, just needs to have nested autodiff, I think.)

So anyway, some more clarity? I don’t think about complex numbers much, so even some sort of reference to what you’d use. Maybe a question would be… do you expect the accumulated target to be a complex number?


#51

I think the simplest way to think about things is via identities such as e^{i \pi} = -1. If we let z = 0 + i \pi be a complex function, we can differentiate the left-hand side with respect to z. Since e^{z} = \cos\left(\pi\right) + i \sin\left(\pi\right), the derivative is given by \frac{d}{dz} e^{z} = -\sin\left(\pi\right) + i \cos\left(\pi\right) = 0 - i. And we have a unit test for (part of) this.


But I don’t know how to write a unit test that imag(f).grad(x, g); implies g[0] is i.


#52

Looking at what code gets executed when we do some tests involving std::complex<stan::math::var>, I see that that line 77 of vari.hpp is called seven times despite the code comment saying // this will never get called. Does anyone know how to investigate this?

e[0;36m/home/ben/stan-dev/math/./stan/math/rev/core/vari.hpp:
e[0m       |    1|#ifndef STAN_MATH_REV_CORE_VARI_HPP
       |    2|#define STAN_MATH_REV_CORE_VARI_HPP
       |    3|
       |    4|#include <stan/math/rev/core/chainable_alloc.hpp>
       |    5|#include <stan/math/rev/core/chainablestack.hpp>
       |    6|#include <ostream>
       |    7|
       |    8|namespace stan {
       |    9|namespace math {
       |   10|
       |   11|// forward declaration of var
       |   12|class var;
       |   13|
       |   14|/**
       |   15| * The variable implementation base class.
       |   16| *
       |   17| * This class is complete (not abstract) and may be used for
       |   18| * constants.
       |   19| *
       |   20| * A variable implementation is constructed with a constant
       |   21| * value. It also stores the adjoint for storing the partial
       |   22| * derivative with respect to the root of the derivative tree.
       |   23| *
       |   24| * The chain() method applies the chain rule. Concrete extensions
       |   25| * of this class will represent base variables or the result
       |   26| * of operations such as addition or subtraction. These extended
       |   27| * classes will store operand variables and propagate derivative
       |   28| * information via an implementation of chain().
       |   29| */
       |   30|class vari {
       |   31| private:
       |   32|  friend class var;
       |   33|
       |   34| public:
       |   35|  /**
       |   36|   * The value of this variable.
       |   37|   */
       |   38|  const double val_;
       |   39|
       |   40|  /**
       |   41|   * The adjoint of this variable, which is the partial derivative
       |   42|   * of this variable with respect to the root variable.
       |   43|   */
       |   44|  double adj_;
       |   45|
       |   46|  /**
       |   47|   * Construct a variable implementation from a value.  The
       |   48|   * adjoint is initialized to zero.
       |   49|   *
       |   50|   * All constructed variables are added to the stack.  Variables
       |   51|   * should be constructed before variables on which they depend
       |   52|   * to insure proper partial derivative propagation.  During
       |   53|   * derivative propagation, the chain() method of each variable
       |   54|   * will be called in the reverse order of construction.
       |   55|   *
       |   56|   * @param x Value of the constructed variable.
       |   57|   */
  18.8M|   58|  explicit vari(double x) : val_(x), adj_(0.0) {
  18.8M|   59|    ChainableStack::instance().var_stack_.push_back(this);
  18.8M|   60|  }
       |   61|
       |   62|  vari(double x, bool stacked) : val_(x), adj_(0.0) {
       |   63|    if (stacked)
       |   64|      ChainableStack::instance().var_stack_.push_back(this);
       |   65|    else
       |   66|      ChainableStack::instance().var_nochain_stack_.push_back(this);
       |   67|  }
       |   68|
       |   69|  /**
       |   70|   * Throw an illegal argument exception.
       |   71|   *
       |   72|   * <i>Warning</i>: Destructors should never called for var objects.
       |   73|   *
       |   74|   * @throw Logic exception always.
       |   75|   */
      7|   76|  virtual ~vari() {
      7|   77|    // this will never get called
      7|   78|  }
       |   79|
       |   80|  /**
       |   81|   * Apply the chain rule to this variable based on the variables
       |   82|   * on which it depends.  The base implementation in this class
       |   83|   * is a no-op.
       |   84|   */
  1.70G|   85|  virtual void chain() {}
       |   86|
       |   87|  /**
       |   88|   * Initialize the adjoint for this (dependent) variable to 1.
       |   89|   * This operation is applied to the dependent variable before
       |   90|   * propagating derivatives, setting the derivative of the
       |   91|   * result with respect to itself to be 1.
       |   92|   */
   474k|   93|  void init_dependent() { adj_ = 1.0; }
       |   94|
       |   95|  /**
       |   96|   * Set the adjoint value of this variable to 0.  This is used to
       |   97|   * reset adjoints before propagating derivatives again (for
       |   98|   * example in a Jacobian calculation).
       |   99|   */
  2.75G|  100|  void set_zero_adjoint() { adj_ = 0.0; }
       |  101|
       |  102|  /**
       |  103|   * Insertion operator for vari. Prints the current value and
       |  104|   * the adjoint value.
       |  105|   *
       |  106|   * @param os [in, out] ostream to modify
       |  107|   * @param v [in] vari object to print.
       |  108|   *
       |  109|   * @return The modified ostream.
       |  110|   */
       |  111|  friend std::ostream& operator<<(std::ostream& os, const vari* v) {
       |  112|    return os << v->val_ << ":" << v->adj_;
       |  113|  }
       |  114|
       |  115|  /**
       |  116|   * Allocate memory from the underlying memory pool.  This memory is
       |  117|   * is managed as a whole externally.
       |  118|   *
       |  119|   * Warning: Classes should not be allocated with this operator
       |  120|   * if they have non-trivial destructors.
       |  121|   *
       |  122|   * @param nbytes Number of bytes to allocate.
       |  123|   * @return Pointer to allocated bytes.
       |  124|   */
  19.8M|  125|  static inline void* operator new(size_t nbytes) {
  19.8M|  126|    return ChainableStack::instance().memalloc_.alloc(nbytes);
  19.8M|  127|  }
       |  128|
       |  129|  /**
       |  130|   * Delete a pointer from the underlying memory pool.
       |  131|   *
       |  132|   * This no-op implementation enables a subclass to throw
       |  133|   * exceptions in its constructor.  An exception thrown in the
       |  134|   * constructor of a subclass will result in an error being
       |  135|   * raised, which is in turn caught and calls delete().
       |  136|   *
       |  137|   * See the discussion of "plugging the memory leak" in:
       |  138|   *   http://www.parashift.com/c++-faq/memory-pools.html
       |  139|   */
    237|  140|  static inline void operator delete(void* /* ignore arg */) { /* no op */
    237|  141|  }
       |  142|};
       |  143|
       |  144|}  // namespace math
       |  145|}  // namespace stan
       |  146|#endif


#53

@Bob_Carpenter can correct me if I’m wrong, but the autodiff memory allocation was designed in a way so that the vari destructors should never get called (everything goes onto an internally allocated stack and deallocation is simply moving the pointer to the top of that stack back to zero)

In the end there are a few vari implementations that need their destructors called cause they allocate memory outside of this stack (and need their destructors called it clean it up)

There’s an issue to fix it here: https://github.com/stan-dev/math/issues/565 . Bob said he vaguely remembered it was probably something to do with the things that used ldlt factorizations. tldr;, the comment is misleading I think. Do you use any of the functions in that issue? If so your destructor might get called.