Proposed parallelism RFC - Stan language bits

wds15 · July 3, 2019, 6:49am

I don’t really care as long as it works nicely. If you are interested in the details, you may start here: Documentation Library

That’s easy, I think: Just look at the C++ bit above. The statement

return_t lpmf = stan::math::parallel_reduce_sum(
      count_iter(1), count_iter(elems+1), return_t(0.0),
        [&](int start, int end) {
          return hierarchical_reduce(start, end, y, log_lambda_group, gidx, pstream__);
        }, grainsize);

Will internally start a parallel reduce from the TBB. Only after the parallel reduce is complete in it’s entirety, the function will return. In case you nest another call to another parallel function in that, then that is not a problem as that nested parallel region will also run until completion in the nested call. Now, having said that it is not impossible to create dead-locks when using these techniques - so things are not safe by simply using the TBB. In particular with nested parallelism you sometimes need special considerations, see here: Documentation Library

The TBB offers a concept called “isolation” which means that threads may only be used in dependent tasks, but not in any other. I don’t think we need this feature, but it’s good that it is there.

@andrjohns This is off topic for this thread and it has been discussed as “super-nodes” in the context of the parallel design RFC here: adding parallel_autodiff RFC by wds15 · Pull Request #5 · stan-dev/design-docs · GitHub

In super-brevity:

what I propose will allow us to create super-nodes easily (if we implement sub-slicing of data in Stan-math which I don’t think we should waste time on, since it is already implemented in Stan)
the map_rect puts some burden on users to program their function, but it gives them ultimate flexibility. The real problem with map_rect is (i) user-unfriendliness due to packing/unpacking of arguments and (ii) it gives up vectorisation and (iii) it requires the user to pre-specify sharding. All of these points are addressed.
A really good example for the need of the flexibility of a general reduce is given from @paul.buerkner in this post: Parallel reduce in the Stan language - #11 by paul.buerkner

Super-nodes are useful, yes, but it’s far more useful to have a general thing since otherwise that super-node you need for your model is missing.

…and… @syclik , the post Parallel reduce in the Stan language , is actually another write-up of how I think the new parallel reduce things are to be used in the Stan language. I am going to link that in the parallel RFC as well.

Topic		Replies	Views
Parallel reduce in the Stan language Developers	12	1235	April 11, 2019
Stanc3 parallel reduce_sum Developers	21	1110	April 9, 2020
MPI Design Discussion Developers	267	14896	January 30, 2018
Parallel autodiff v4 Developers	97	3852	June 20, 2023
Stan Language Bits for `reduce_sum` Developers	16	770	January 23, 2020

Proposed parallelism RFC - Stan language bits

Related topics