Now that we have nested gradients in reverse mode chains, per this recent PR, we can improve the differentiation of the algebraic solver.

For f(u, \vartheta) = 0, where u is the solution to the algebraic solution and \vartheta the inputs which require sensitivities, the Jacobian is

Currently, we compute the Jacobian matrix explicitly but it suffices to compute

where w is the right initial cotangent, namely w = \frac{\mathrm d \mathcal T}{\mathrm d u}, \mathcal T being the target distribution. We still need to compute

explicitly (either using n forward mode sweeps or if we’re clever extracting this variable from the last step of the solver). We then get

in one reverse mode sweep and use a `solve`

method to get the requisite sensitivities. This approach spares us:

(i) computing \mathrm d f / \mathrm d \vartheta explicitly

(ii) replaces a matrix-matrix solve with a matrix-vector solve.