Slicing on both data and parameter vectors in reduce sum

Suppose I have n x 1 data vector y with n x 1 parameter vector, lambda_y as its mean. So, lambda_y may be viewed as a latent response. I’d like to use reduce_sum() to slice on both (y,lambda_y), but partial_sum(y,start,end,lambda_y,...) doesn’t explicitly allow slicing on both y and lambda_y. Can I slice on both or must I choose between them? I saw a related thread that said it would be more efficient to slice on lambda_y because it passes by copy.

Lastly, I note that in my experience if lambda_y were data, then I could slice on both y and lambda_y with the above syntax for partial_sum() (perhaps related to passing by reference).

If the data are reals, then if you really want you can pack the data and parameters into a single array and then slice that. But I don’t think you will gain much by doing this; passing the full data out to partial_sum doesn’t carry much overhead.

Thank you for your help. I have learned a lot from your replies to issues and questions about reduce_sum(). Unfortunately y are integers. Are you saying that if I had to pick between slicing on y or lambda_y that it doesn’t really matter which I pick because passing the full vector of either doesn’t carry much overhead? Seems like passing the full lambda_y would carry more overhead.

1 Like

Given the choice between slicing parameters and slicing data, it is better to slice parameters. However, the penalty for slicing parameters got substantially smaller a couple of versions ago, and it probably isn’t too important unless the size of the parameter vector reaches into the thousands.

Note that sometimes slicing the parameter vector rather than the data results in chunks that are of substantially different size (e.g. if the data associated with one set of parameters is substantially smaller than the data associated with another set of parameters, leading to substantially imbalanced computation per thread). This can be avoided by indexing into the parameter vectors to produce a transformed parameter of the same length as the data, and slicing that. However, I haven’t tested enough to know whether/when this is faster than just slicing the data or just slicing the original parameter vector and doing unequal amounts of computation per thread. @wds15 might have more advice here.

Sounds all good to me. I think the brms vignette is a good read on this:

https://cran.r-project.org/web/packages/brms/vignettes/brms_threading.html

However, I really think that by now reduce_sum is so well optimised that time spend on parameter vs data slicing is almost always wasted (given that you found out that things speedup and you have an idea on when the speedup saturates as a function of cores)… unless one fits a gazillion times the same model, of course. The time is better spend on thinking about the model itself, profiling it or other ways of being clever about the problem at hand.

… and don’t forget to use the new cmdstan feature to run all chains in a single go with a very large thread pool instead of independent runs for each chain…

1 Like

Hi, is it an extra optimization feature the user needs to turn on or is it automatic with the latest version of cmdstan? Thank you.

don’t know. @rok_cesnovar ?