Plans for parallelization

Bob_Carpenter · March 25, 2018, 9:03pm

If you want a large effective sample size per wall time, more chains will do that for you in an embarassingly parallel fashion.

If you want the fastest wall time to an effective sample size of 100, that can’t be done by parallelizing with more chains. The real bottleneck there is warmup.

The chains can’t be spread over more CPUs because each chain is serial (hence the name).

But as the other responders point out, what can be parallelized is the implementation of the log density and gradient.

That’s RStan and it actually requires more than that in terms of dynamic overhead for copies.

CmdStan can stream data out so there’s no memory overhead other than the data, which is rarely a bottleneck. RStan can stream out data, but I’m still not sure if it can turn off accumulating it in memory. I don’t know about PyStan, but if it doesn’t, feel free to open a feature request.

Topic		Replies	Views
Does Stan use multiple cores for multiple chains simultaneously even if we don't specify cores = 4 for chains =4? General	3	478	March 26, 2020
What is the main bottleneck in parallelizing Stan? Developers	6	909	September 27, 2019
Running cmdstanr in parallel on computing cluster General	6	969	December 9, 2022
Extending reduce_sum to use MPI Developers	5	483	June 1, 2021
Running MPI General performance	9	1949	August 24, 2018

Plans for parallelization

Related topics