Parallel autodiff v4

bbbales2 · February 18, 2020, 8:55pm

@andre.pfeuffer thanks for the model. I wrote a parallel version of it and ran a few benchmarks so we can add that to the collection here.

bpl.data.R (6.1 KB) bpl.stan (1.8 KB) bpl_parallel.stan (2.0 KB)

The base model has 1000 data points. To make the model slower I added a ‘rep’ argument to the data file. rep=2 means double the data length by replicating it twice.

rep = 10
grainsize = 1250

8 threads parallel 88s
1 thread parallel 179s
1 thread serial 214s

rep = 10
grainsize = 125

8 threads parallel 60s
1 thread parallel 171s
1 thread serial 214s

rep = 1
grainsize = 125

8 threads parallel 7.7s
1 thread parallel 18s
1 thread serial 18s

Topic		Replies	Views
Parallel autodiff v3 Developers	61	3083	January 23, 2020
Measuring and comparing computational performance in Stan with different compilation alternatives. Using reduce_sum does not bring any advantage? General performance	30	1634	March 26, 2021
Proposed parallelism RFC - Stan language bits Developers	14	1095	July 9, 2019
Help with reduce_sum Modeling	32	1607	August 4, 2020
Reduce_sum results in much slower run times, even for large datasets Algorithms paralellization	6	1508	March 17, 2022

Parallel autodiff v4

Related topics