Thanks to @sakrejda the PR for the map_rect_concurrent implementation is reviewed and ready to be merged. However, before hitting “merge” I wanted to get everyone informed and on board with the decisions taken. So these are in particular:
map_rect_serialis dropped as the concurrent version will simplify to a serial version if
STAN_THREADSis not defined during compilation.
- The number of threads created is (roughly) controlled by the environment variable
STAN_NUM_THREADS(just like for openMP which uses
- Threading stuff is only ever used if things are compiled with
- Not setting
STAN_NUM_THREADSor setting it to a non-numeric value results in a single thread being used
STAN_NUM_THREADSto a positive value means to use that many threads (at most)
-1means to use as many threads as we have CPUs on the machine
- Having more jobs to execute than threads means to reduce the number of threads to the number of jobs
- Anything else will just use 1 thread
I hope everyone is fine with these conventions. I suggest to discuss this on Thursday (unless everyone is happy with this right away).
Note that because I am using the C++11
async facility I can only control how many chunks of work I am generating. This should translate into how many threads are created, but I have no control over that, since this is decided upon by the implementation. This is why the number of threads quoted above corresponds to the maximal number of threads (maybe we improve on that using a threadpool in the future).
… and yes, within-chain parallelization in Stan is now super close to land.
See the PR: