Yes, there’s a bit of a performance cost for
operands_and_partials. The important thing is to get things working first, then we can optimize later.
But I’ll second @bbbales2 comment about adjoint-Jacobian-apply. That’ll make the reverse mode efficient.
In most cases, we just use the templated definition in prim for forward-mode. It’s not the most efficient approach, but it works. When we actually start using forward mode in practice, we’ll want to start improving forward mode efficiency. We haven’t spent much time there at all.