Great! I agree in that this should go in quickly. There are two choices to make in going forward which I tried to analyze using benchmarks. So we have the choices:
map_rect(f, real eta, real[,] theta, real[,] x_r, int[,] x_i) which allows shared parameter vector eta
map_rect(f, real[,] theta, real[,] x_r, int[,] x_i) which does not allow a shared parameter vector eta
add in an additional
map_rect_lpdf which does direct aggregation of the user function output. This version can aggregate directly the function value and the partials of the shared parameter vector.
Now, the example I used is an analytic 2-cmt oral PK model which has 5 parameters in total. Of these 4 are not subject specific while 1 is subject specific. I do run the program with
J=64000 subjects on 20 cores. The timing for
1000 transitions using 10 leapfrog steps per transition would take... is
- 6300s for serial analytic MPI, cpu=4, shared, not summed
- 7100s for serial analytic MPI, cpu=4, not shared, not summed
- 6400s for serial analytic MPI, cpu=4, shared, summed
So either I have done some mistake or this simply means that the gradient evaluation is not any more a bottleneck! Only the trick with a shared parameter vector gives some speedup, but considering the huge size I had to set in order to see a difference, we could opt for not supporting this for now. Hence, copying the shared parameter vector J times is not so costly given that the gradient evaluation is made so much faster using MPI. The same seems to be true for the summed version (using the
map_rect_lpdf function) as well. This may look different whenever higher order AD is used, since then more aggregation can occur.
However, all in all, I would conclude that we should simply go with the original proposed design of
map_rect(f, real[,] theta, real[,] x_r, int[,] x_i)
This is a lot simpler to implement, since there is only a single argument which can be a
var… and any other optimization seems to be not useful since the gradient evaluation is not anymore the speed limiting operation. At least this is what would be my take.
Let’s discuss tomorrow.