Improved differentiation of the algebraic solver

Now that we have nested gradients in reverse mode chains, per this recent PR, we can improve the differentiation of the algebraic solver.

For f(u, \vartheta) = 0, where u is the solution to the algebraic solution and \vartheta the inputs which require sensitivities, the Jacobian is

\frac{\mathrm d u}{\mathrm d \vartheta} = \left [ \frac{\mathrm d f}{\mathrm d u} \right]^{-1} \frac{\mathrm d f}{\mathrm d \vartheta}

Currently, we compute the Jacobian matrix explicitly but it suffices to compute

\left [ \frac{\mathrm d u}{\mathrm d \vartheta} \right ]^T w

where w is the right initial cotangent, namely w = \frac{\mathrm d \mathcal T}{\mathrm d u}, \mathcal T being the target distribution. We still need to compute

\frac{\mathrm d f}{\mathrm d u}

explicitly (either using n forward mode sweeps or if we’re clever extracting this variable from the last step of the solver). We then get

\left [ \frac{\mathrm d f}{\mathrm d \vartheta} \right ]^T w

in one reverse mode sweep and use a solve method to get the requisite sensitivities. This approach spares us:
(i) computing \mathrm d f / \mathrm d \vartheta explicitly
(ii) replaces a matrix-matrix solve with a matrix-vector solve.


One mild annoyance I’ve had doing jacobian^T adjoint products is grad takes a vari and calls init_dependent on it before doing the chain.

Since when we do this we fill up the adjoints manually, it’s convenient to not get that init_dependent call (which sets the adjoint of the vari passed to grad to 1.0).

In a branch somewhere I changed this:

To be something like:

static void grad(vari* vi = NULL) {