Regarding adjoint sensitivity of ODE discussed in today’s meeting.

Like @Bob_Carpenter pointed out, the concept is just like forward AD vs reverse AD. The gist of adjoint sensitivity is that given a function u constrained by

we are interested in some functional of u

and its sensitivity regarding \theta

An example would be u is the trajectory of a particle’s movement, and J is the energy dissipated along the way, from time t_0 to t_1. One way to find adjoint is through optimal control perspective, where J and the constraint in the constrained optimization problem

are used to construct Lagrangian

Then the adjoint is just the Lagrange multiplier \lambda that satisfies

After solving for \lambda, one can calculate the sensitivity

From the last equation, we can interpret adjoint as a measure of the sensitivity of the constraint, i.e., how much F_{\theta} contributes to the total change of J.

The advantage of using adjoint is that its solution is independent of the the size of parameter \theta, so that if \dim(\theta) >> \dim(J), we’ll gain from solving only once the adjoint system.

In the context of ODE, the constraint F would be the actually ODE to be solved, only this time \lambda would be time dependent, just like u, and we need solve both the prime and the adjoint ODEs. This is actually easier than it sounds, because the Jacobian of adjoint system F_u^T is the transpose of the Jacobian of the prime system, so the two ODEs share same property of stability, and in general we can use a same scheme to treat them.

CVODES/IDAS are able to solve both the forward prime problem and the backward adjoint problem. Where Stan’s AD comes in is when we need Jacobian F_u and J_u for adjoint ODE. In particular, we don’t explicitly need F_u but only F_u^T\lambda, the directional derivative in \lambda direction, this can be solved directly in AD. As a matter of fact, CVODES/IDAS provide an AD interface for this purpose.

For PDE the idea is the same, except this time the evaluation of F_u^\lambda is much more complex. Because now unlike in ODE, F is not naturally in R^n, with n the size of ODE, but in certain function space. There’s a lot effort goes into discretize F into a finite-dimensional version F_h, before all the AD magit can take place. This is why code like ADIFOR takes another approach of AD, by transforming the entire process of F\rightarrow F_h into an AD version, so we can go ahead calculating F_{hu}^T\lambda. The bottom line is that for PDE there is no easy way to get F_u out of box by using Stan Math on certain functor, and we have to rely on external PDE solvers for that. This is behind the logic of forward_pde. The function expects a user-supplied function calculating J(u(\theta)) and dJ/d\theta, assuming u is a solution from external ODE/PDE solvers. Another related item is adj_jac_apply by @Bob_Carpenter and @bbbales2, where user-supplied F_u can be incorporated into AD chain easily. I’m working using `adj_jac_apply`

as backend of `forward_pde`

.