@wds15 has a design doc up documenting his proposal for parallel for and and reduce using TBB and a refactored independent autodiff stack mechanism. Please put any comments on the specific proposal on the github PR; this thread is just announcing it and explaining a little bit about the process we’re trying out.
Here’s the intro text:
This RFC will enhance parallelization facilities in stan-math.
At the core it does implement
- parallel for with automatic sharding
- parallel reduce with automatic sharding and vectorization
To get there a few things need to be refactored and introduced
- independent AD
- refactored nested AD
- Intel TBB is proposed to implement the parallel algorithms
We’re trying out the design doc process that we’ve copied and slightly adapted from the Rust community’s RFC process. The basic idea is that we’d like to get some of the more complicated designs formalized and get a set of basic questions answered about all of them before we start writing code. This is an opportunity for the entire community to comment and try to make the proposed design better. We’ll give it something like a minimum of a week to air publicly and get to a decision on it in a few weeks max (that decision could take a variety of actions including postponement, but we’ll seek some action and closure on the issue in that time period). Please see the design docs repo README (and the original Rust one if curious) for more details.
@syclik, do you have time to be the reviewer for this one? We definitely need to get your comments if possible as this is a change to the math library more involved than just adding new functions and classes.