I have two desiderata for this.
-
If we have to represent an explicit dense Jacobian for functions with multivariate output, we’re toast. Just consider something like matrix inverse—that’s an N^4 Jacobian for an N \times N matrix. Therefore, a model like operands-and-partials can be at most a partial solution, at least until it can be sparse as @wds15 is suggesting .
-
If we have to compute derivatives with respect to non-parameter inputs, we’re losing a lot of efficiency. At the same time, if we comput them separately without sharing any computation, we’re also toast.
Reverse mode
So I’ve been thinking we need something along the lines of an adjoint-jacobian product approach. Suppose we have a function of type
T0 foo(T1 x1, ..., TN xN)
for which we’d like to define derivatives.
Conceptually for reverse mode, what we need is to be able to compute the function value (using double
inputs) in the forward pass of reverse mode. Then in the the revese pass, when we hit the result y = foo(x1, ..., xN)
, for each input that’s non-constant, we need to add the product of the adjoint of y
with the Jacobian of y
with resecpt to xn
and then add the result to the adjoint of xn
. All of these operations only involve double
(and int
) values.
One way to do that without fully satisfying desideratum (2) is to write paired functions for the value and the adjoint-Jacobian product with respect to each input.
T0 foo(T1 x1, ...., TN xN) { ... }
T1 foo_adj_jac_prod_1(T0 y, T0 y_adj, T1 x1, ..., TN xN) { ... }
...
TN foo_adj_jac_prod_N(T0 y, T0 y_adj, T1 x1, ..., TN xN) { ... }
The first function foo
defines the operation of the function itself as usual. That function gets executed in the forward pass of reverse mode with double
arguments. We only need to save a reference to the output and each of the inputs as those will contain the relevant pieces of info we need in the reverse pass.
At the point in the reverse pass where we need to execute the chain()
operation on output y
, we pull out the value y
and all inputs x1, ..., XN
as double values as well as the adjoint y_adj
as double values, then pass them all into the adjoint-Jacobian function. The result is then added to the adjoint of xn
.
Forward mode
We need to multiply the input’s tangent value by the Jacobian of the output w.r.t. that input to get a term in the output’s tangent. So we need functions with signatures like this for each input
T0 foo_jacobian_tan_1(T1 x1, ..., TN xN, Tn x1_tan) { ... }
TN foo_jacobian_tan_1(T1 x1, ..., TN xN, Tn xN_tan) { ... }
Then we just apply that to all the non-const inputs and add the results into the tangent of y
. We might need to include the other tangents as arguments, but in that case, I don’t know how to deal with ones that aren’t defined because the input’s a constant.
Jacobians as a shortcut
If we have the Jacobian w.r.t. each input fully expressed as a templated function, we could use it to implement all of the above or just use it directly.