Yup, that's the right way to multiply it out. I think the overall message got lost though. If I have a function
f : R^N -> R^M, I can compute the Jacobian column-wise or row-wise using
M reverse-mode passes
N forward-mode passes
Forward mode should be faster, but how much faster depends on the problem. So the choice of which to use depends on the problem and the relative size of
Reverse mode is faster only if
M << N. If they're roughly the same size or
M is larger, forward-mode should be more efficient. The exact breakeven point will depend on the function being evaluated.
Forward mode should also use much less memory, so there's also that consideration. and there's no thread contention with forward mode as there's no global shared object.