Now that we have nested gradients in reverse mode chains, per this recent PR, we can improve the differentiation of the algebraic solver.
For f(u, \vartheta) = 0, where u is the solution to the algebraic solution and \vartheta the inputs which require sensitivities, the Jacobian is
Currently, we compute the Jacobian matrix explicitly but it suffices to compute
where w is the right initial cotangent, namely w = \frac{\mathrm d \mathcal T}{\mathrm d u}, \mathcal T being the target distribution. We still need to compute
explicitly (either using n forward mode sweeps or if we’re clever extracting this variable from the last step of the solver). We then get
in one reverse mode sweep and use a solve
method to get the requisite sensitivities. This approach spares us:
(i) computing \mathrm d f / \mathrm d \vartheta explicitly
(ii) replaces a matrix-matrix solve with a matrix-vector solve.