The T argument to reduce_sum is not really needed,
real reduce_sum(F f, T[] x, int grainsize, T1 s1, T2 s2, ...)
The data in T could be passed as part of the extra arguments (...). The current interface is actually cumbersome when data are not easy to represent as a single vector. For example, the data might have one integer and one real per row. The partial_sum function doesn’t really need x_subset. The start and end arguments provide sufficient information to evaluate the data subset.
I think it’s because the autodiff has to calculate the gradient of the output with respect to every input argument. Smaller input slice means less work.
I am skeptical that autodiff behavior is sensitive to what is in the function argument list. I believe autodiff follows the actual use of variables, not whether they are available in a particular function.
Why should we? You can just give it a dummy count array to basically kill it. That will not hurt you at all.
The slicing argument is intended to slice large chunks of autodiff variables.
Think about it: The shared arguments get copied as many times as the large sum is being partitioned. The slicing argument is only doubled in memory. That is a very useful thing and users are free to kill it with dummy arguments.
I don’t think we need such a reduced interface for reduce_sum.
The sliced argument is always doubled in memory no matter what.
for the shared arguments it’s different for data vs parameters. Data in the shared arguments is never copied, since it is read only. For the parameters we have to make a full copy for every partial sum evaluation.