I believe that you are confusing how autodiff works, in which case I highly recommend going over the autodiff paper again. In AD each variable in the expression graph is augmented with some additional information – in forward mode these are tangents and in reverse mode adjoints. Once this information has been propagated through the entire expression graph you either get components of directional derivatives in forward mode or components of the gradient in reverse mode.

The information is propagated by grabbing the inputs or outputs to a given function, multiplying by the partials of that function, and passing along the result. For example, in forward mode if we had a function with N inputs and 1 output we’d take the N tangents from the input parameters, multiply each by the corresponding partial df/dx_i, and then add those products up as the tangent of the function output which then gets passed along. In reverse mode we do the opposite, taking the adjoint from the output, multiplying it by the partials, and then passing to the inputs.

Following this logic, for a reverse mode implementation of a root finder you’d take the adjoints of the roots, multiply them by the appropriate partials, and pass those products along to the auxiliary parameters. If there are N roots and M auxiliary parameters you’ll need N x J partials, which is exactly the number of components in the implicit Jacobian.