@syclik is incorrect in the relative performance for forward mode. For a general function f: R^{N} -> R^{M}
the cost to compute the full Jacobian will be O(M)
for reverse mode and O(N)
for forward mode. In other words, for reverse mode you have to sweep backwards M times to handle all of the outputs where as in forward mode you have to sweep forwards N times to handle all of the inputs.
There are also differences in the coefficients. In particular, forward mode doesn’t need to save the entire expression graph in memory like reverse mode does, so it will be a little bit faster per sweep. Another way of thinking about it is that reverse mode actually requires one forward sweep to propagate the function values so that partials can be computed, where as forward mode can do that at the same time it’s propagating the tangents.