Following up on the question posted here (I can’t reply to that post, because apparently my registration here is too new): http://discourse.mc-stan.org/t/repeated-ad-of-function-with-varying-arguments/105?u=akuz

The question was about computing the gradient of f: R_N to R_1 at many input points and reusing the stack. The answer to that was that it’s not possible.

However, we can also treat the evaluation at many input points M as a multivariate function from R_N to R_M. According to Wikipedia, when M >> N, then forward mode autodiff would be more efficient. https://en.m.wikipedia.org/wiki/Automatic_differentiation

So, is it correct that for the case of computing the gradient at many input points (however, with the same values of parameters wrt which we are differentiating), it would be more efficient to use forward-mode autodiff?