Implicit Function Theorem Math

I’ve attached a calculation of the Jacobian of the roots with respect to the auxiliary parameters that was discussed today at the meeting. I was even nice enough to hold back on defining it for general manifolds. ;-)

@charlesm93 let me know if the calculation and sketch of the subsequent algorithm is clear.

implicit_function_theorem.pdf (63.4 KB)

@betanalpha Thank you very much. Everything seems clear to me! I left out the (-1) factor in my previous calculation of the Jacobian and could only make sense of it after looking at a proof of the theorem.

There is a basic implementation and unit test on GitHub.

Calculating Jacobians of arbitrary functions inside a potentially very costly numerical algorithm makes some bells ring here to me as the same thing is done for the ODE stuff.

In short: Should we want to parallelize this rootfinding business with the same approach as for the ODEs, then we should probably align the design here. That is, we would need to make the default implementation use AD to get the Jacobian, but also allow for the possibility to supply the Jacobian in analytic form in order to shutdown AD during the rootfinding process. When that is possible, then it will be straightforward to OpenMP parallelize the rootfinding stuff.

Just a thought - but I don’t know how costly this rootfinding is, so maybe its not worth the effort.

Sebastian

@wds15:

Some clarifications:

Should we want to parallelize this rootfinding business with the same approach as for the ODEs

When you write “parallelize”, you mean create a parallelized implementation of the root-solver, comparable to the one we’re trying to build with the ODE integrator? I’m certainly interested! This is why we (Bill and I) considered using the KINSOL solver instead of Eigen’s. That said, the design should be agnostic to which solver we use.

we would need to make the default implementation use AD to get the Jacobian

I’m not sure what you mean. For the root-solver, we are not solely using AD. If we did that, we’d have to template the entire dogleg function (which is what I did for the current matrix_exp() function). We do use jacobian() to get Jx and Jy, where I’m using Michael’s notation. Which Jacobian would the user be passing? ∂h(y)/∂y ?

Well, we have not yet decided if we want to put into Stan a bunch of “_parallel” functions, but that would currently be the idea, yes.

So what I was suggesting is to introduce in addition to a “rootsolve” function a “rootsolve_parallel” function. That can be done easily if and only if during the root solving run you can disable the use of the AD system. Hence, the calls to the jacobian function which currently work by using AD have to be replaced with analytic variants. We can quickly discuss on Monday.

However, as I said, I am not sure if these “_parallel” variants of numerically costly functions will be part of Stan; I understood that we yet need to all agree on this as with all additions to the language.

Sebastian

Got you. Thanks for clarifying.

@wds15 Any parallel code is still a future consideration and not a strong factor in the design here. Even updating the design of the ODE code to admit analytic Jacobians is still in its nascent stages.

I’m trying to use the Jacobian determined by the implicit function theorem…

I know I can act on the adjoint of a parameter using: parms(i).vi_->adj_ += ... as I’ve seen in the code for quad_form_sym() (I picked a function that returns a matrix under rev as a guiding example). But it doesn’t make sense to compute one adjoint per parameter, because the output of algebra_solver() is a vector and the log posterior is not explicitly defined.

Ok, so clearly another approach is needed. Or can I capture the information contained in the Jacobian using only the adjoints of the parameters?

Why aren’t you using the Jacobian functionals?

Let me clarify: I am using the Jacobian functional and I computed a variable Jx_p (Jacobian of the solution w.r.t the parameters).

I’m unclear on what to do next, i.e how to use this variable in the AD chain. The functions I have found in the Stan code, for which the gradients are worked out analytically, usually map to R, not R^n.

You mean that you can compute the initial Jacobians, take the inverse and construct the implicit Jacobian, but you don’t know how to return roots and the implicit Jacobian as autodiff variables?

You mean that you can compute the initial Jacobians, take the inverse and construct the implicit Jacobian, but you don’t know how to return roots and the implicit Jacobian as autodiff variables?

I wouldn’t have any difficulty implementing what you describe, but I don’t understand why this is what we want to do.

Our goal is to compute the gradient of the log posterior with respect to the parameters. In AD, this information is stored in the adjoint of the parameters, so I’m under the impression I need to modify the adjoint of the input parameters (i.e the auxiliary parameters). However, one adjoint per parameter is not enough to capture the information in the implicit Jacobian.

Your comment suggests my focus should be on the roots and the implicit Jacobian, not the input parameters.

I believe that you are confusing how autodiff works, in which case I highly recommend going over the autodiff paper again. In AD each variable in the expression graph is augmented with some additional information – in forward mode these are tangents and in reverse mode adjoints. Once this information has been propagated through the entire expression graph you either get components of directional derivatives in forward mode or components of the gradient in reverse mode.

The information is propagated by grabbing the inputs or outputs to a given function, multiplying by the partials of that function, and passing along the result. For example, in forward mode if we had a function with N inputs and 1 output we’d take the N tangents from the input parameters, multiply each by the corresponding partial df/dx_i, and then add those products up as the tangent of the function output which then gets passed along. In reverse mode we do the opposite, taking the adjoint from the output, multiplying it by the partials, and then passing to the inputs.

Following this logic, for a reverse mode implementation of a root finder you’d take the adjoints of the roots, multiply them by the appropriate partials, and pass those products along to the auxiliary parameters. If there are N roots and M auxiliary parameters you’ll need N x J partials, which is exactly the number of components in the implicit Jacobian.

Slap on the forehead! I now get it! Complementing your explanation, I found it very helpful to draw the expression graph for the root-solver. Awesome! @betanalpha Thank you for all your help and patience!

No problem!

Hey Michael,

Could I talk you out of the TeX code used to generate the doc? I may want to use some of it in the ONR proposal and I’m being a bit lazy about typing it myself.

Thanks,
Bill

Can’t upload .tex to Discourse so I’ll mail it directly.