Complex number support in Math library autodiff

There’s a thread going on in a Pull Request on the Math repo about complex number support. It looks like we need to take a step back and figure out what applications we’d like to support and then write out a spec for it before going back to coding. Let’s use this thread for those purposes.

To start us off, what are people hoping to do with complex numbers in Stan? @ChrisChiasson is leading the implementation efforts and thus the one to convince ;)

If we can’t reach consensus, perhaps we should #define some kind of flag to turn on the complex implementation. That way, the patch could land and be part of the testing process so it won’t bit rot, but users who don’t know about the implementation status will be protected.

From my experience with the project the way to go is to find the use-case you want to support (particularly one that can’t be done in straightforward fashion currently) and implement that on a branch. Any PR you are working on will get ongoing testing when you commit so you don’t have to worry about bit rot anyway. I agree with @Bob_Carpenter that a spec would be useful but if I were you I would try to focus that on the minimal set of operations you need for your use-case and enough related operations so that you can doc it without a million caveats.

This is hard and it’s the same reason we don’t have support for sparse operations but in the develop branch there’s no way around juggling the implementation/future maintenance/doc etc…

1 Like

The PR seems like it is going to be closed

Anytime you need to push testing burden onto the testing pipeline you could re-open the PR. What we want to avoid is kicking of the testing every time you make a small change. I know it can seem like being re-buffed when a PR is closed but that’s not what it means in this project. We just have computationally demanding testing. It would be cool (@seantalts ?) if these not-quite-ready PR’s could get lower priority so the testing only runs when the pipeline has extra room but that’s more complicated than doing it manually for now.

Sorry to sound like a broken record, but we can’t reach a decision without a design. I don’t know what this feature is supposed to be doing.

We try to work by consensus, but it’s not 100% required. Daniel Lee has final authority on the math lib and I have final authority on the langauge. I’m still not even sure if there are implications for the language of including a complex type.

I don’t know how @seantalts or @syclik feel about this, but I’d strongly prefer to not include speculative work commented out in our releasead code.

Instead, I’d suggest working on a branch. That’s what we’ve done with MPI and GPU development as it was converging toward a final design. And both of those are finally about to launch. This shouldn’t be as complicated as those two in terms of builds, etc., but it’s in many ways a bigger design decision as the hard parts of the GPU and MPI design were internal, not user-facing.

Exactly. There has also been the poor organization of the AST in the parser holding us back, but that’s being remediated and we should soon have tuples and ragged arrays.

What will then be holding sparse matrix operations back is a coherent design that fits into the rest of the way Stan works, and in particular the way parameters are statically sized and unchanging so that we can do posterior analysis. The complex case lacks some of these moving parts.

1 Like

An application would be complex step approximation. We can use it to calculate the adjoint of some higher dimension analytical functions, such as matrix exponential.

Is there any advantage to doing that over just using autodiff? What would it require to be implemented for complex types?

For a general analytical function f, complex step approximation calculates f' to arbitrary precision in one step of complex function evaluation. For example, take f as our matrix_exp(A), where A is N by N matrix, and we want to calculate df(A)/dA_ij for i=1, ...N, j=1,...N. Autodiff does this with N^4 scalar derivative calculations. Complex step approximation does it with one complex evaluation Im(A + ihI)/h, with h the approximation step. The complexity of the method is hidden in complex evaluation, and it’s more expensive than real-version matrix_exp(A), but it’s still a gain compared to N^4 derivative calculations.

EDIT: it’s df(A)/dA_ij, not previously dA/dA_ij.

That would require complex arithmetic for every operation called, which we don’t currently have. But it sounds like it wouldn’t need to build up an expression graph, which is a huge win. I can’t even imagine how it can do this with a single increment, but then my knowledge of complex analysis extends only so far as knowing i = \sqrt{-1}.

Just the Jacobian alone is size \mathcal{O}(n^4) for an n \times n matrix function like matrix exponential. So we need to store that many partials in the expression graph no matter how we calculate them unless we can feed the result into some kind of reduction to reduce the number of partials we need to store.

What’s the complexity of the double-based calculation?

P.S. It’s so nice to finally have LaTeX! Just use, e.g., $e^x$ to get e^x. It even does previews on the fly.

I’m getting “Math processing error” on my phone. :(

Th downside is that you need an internet-connected MathJax server. I hope it at least shows you the raw LaTeX!

One can use vector or matrix to store a complex pair. Using matrix is straight-forward, just code the SO(2) mappings as it is isomorphic to complex circle group. Using vector of length 2 is more efficient in memory, but arithmetic-wise slightly more messy. It’s perfect for a fine-grained GPU implementation.

If you can give a small matrix example, I can see if the patch can do it. I found this 1D example.

Here’s the use-case @ChrisChiasson posted in the PR:

Conceptually, and ignoring details, it is roughly equivalent to this: I have some tables of complex numbers from which I am indirectly generating linear combinations of some of the columns. Later, I am interpolating (via circular lerp) the resulting combination columns and integrating a function of the amplitudes. I have some checks for my integrations, and I will use multidimensional newton’s method (requiring gradient and hessian) to minimize the error norm between the checks and the integrations as a function of the six coefficients of the aforementioned linear combinations and see if the result makes physical sense. My intention is to implement enough to accomplish this.

Regardless if the patch will work or not, do we want to target specific applications or a general complex number AD? If just for some applications, how do we emit information to developers and users what operantions support complex number and what don’t?

Discourse app on android shows latex correctly.

I think for complex stepping the matrix exponential, it is fine to do it internally and we don’t need a complex number AD just for that.

This is one of the major questions I have—are we looking at something that’s an internal convenience to grease some of the Eigen algorithms, or are we looking at exposing complex results.

There was a lot of interest a year or so ago in exposing an FFT operation that returned both real and complex results in something like a pair. We should be getting pairs soon, so we may want to come back to that. The plan at the time was not to have a complex type, but to return the regular and complex components to make it sure we are only doing regular autodiff w.r.t. the variables in the complex numbers.