I keep talking about it in meetings, but I’ve given very little info on exactly how the adj_jac_apply thing @Bob_Carpenter and I have been working on.
I’m gonna start a thread to pull the covers off on that one a bit and start showing it off. There’s various people I want to brainwash to start using it. Some people know I want to brainwash them (@yizhang and @anon75146577), and some people do not (@anon79882417), but I’ll keep @ ing people as this develops and gets fancier.
It’s still sitting as a pull: https://github.com/stan-dev/math/pull/964 , so it isn’t totally ready to use yet, but here is an example of using it to re-implement solving Ax = b: https://gist.github.com/bbbales2/523db7848df48117dd39788815dfe1a2
What makes adj_jac_apply different from the other ways of implementing autodiff:
- You don’t need to compute the Jacobian of your operator, just a vector^T Jacobian product (which if your Jacobian is sparse can be faster than doing the full Jacobian)
- Each adj_jac_apply will only create one vari on the chaining autodiff stack, no matter how many vars your function outputs
- This automatically handles input types that are mixes of vars and double types
There’s lots more docs inline: https://github.com/stan-dev/math/blob/a196caf6bd627950327b03737ea29f123989936a/stan/math/rev/mat/fun/adj_jac_apply.hpp#L368
The major limitation of adj_jac_apply is if you want to save variables between the forward pass (operator()) and the reverse pass (chain), you gotta do that using the autodiff memory arena. This means you gotta manage the raw memory of your Eigen types (this is shown in the solve example above).
It also only works with reverse mode. No fvars or higher order autodiff.