Parallelizing the sampler (not the model)

Bob_Carpenter · February 16, 2022, 9:40pm

We were just discussing this in another thread starting with this comment from @Red-Portal.

Memory pressure becomes a much bigger deal when parallelizing, as typically only a relatively tiny L1 cache is on-core and the L2, L3 cache and RAM are shared. There’s also the problem of memory locality with a bunch of ad-hoc allocations. Allocating everything together in contiguous memory could be a big saving. This is something I really messed up in coding Stan arrays as C++ std::vector as we have this problem all over the place.

Topic		Replies	Views
Evaluating parallelization performance Developers	23	1907	October 1, 2019
Parallel dynamic HMC merits Developers features	38	3219	September 17, 2019
Multicore Speedups are different between models Algorithms	25	4786	September 11, 2017
Within-chain parallelization idea (maybe crazy) Developers	35	3000	February 24, 2022
What is the main bottleneck in parallelizing Stan? Developers	6	942	September 27, 2019

Parallelizing the sampler (not the model)

Related topics