Adding gradients - operands and partials vs. fwd/rev


When adding gradients for a distribution/function, is there any preference between the use of operands_and_partials or separate definitions for fwd and rev?

It seems like operands_and_partials would lead to less code overall, but is there a performance cost?


Don’t even have to mess with autodiff types with adj_jac_apply: Adj_jac_apply – Yaaaay!

Another example: Adj_jac_apply

It only works for reverse mode though.


I’ve been having a look at that, nice bit of coding!

If we start using adj_jac_apply to add gradients, how is it going to affected when RHMC and forward mode is needed? Will there need to be a separate definition of the gradients for use with forward mode?


This formulation is specifically for efficient reverse mode autodiff. In its current state, it won’t help forward/mixed mode autodiff :(.


Yes, there’s a bit of a performance cost for operands_and_partials. The important thing is to get things working first, then we can optimize later.

But I’ll second @bbbales2 comment about adjoint-Jacobian-apply. That’ll make the reverse mode efficient.

In most cases, we just use the templated definition in prim for forward-mode. It’s not the most efficient approach, but it works. When we actually start using forward mode in practice, we’ll want to start improving forward mode efficiency. We haven’t spent much time there at all.