How could I speed up sampling with threading?

I’m now trying to run sampling jobs.After I checked the performance of threading, I couldn’t improve the estimation speed.
What is required to speed up sampling?

Here is my experiment results.
And summarized estimation table is below.

index compile num_threads sampling num_threads n_jobs iter chains avarage time std time
1 Vanilla 1 4 2000 2 0.883 0.121
2 Vanilla 1 2 2000 2 0.834 0.0624
3 Vanilla 1 4 2000 2 0.935 0.181
4 Vanilla 1 1 2000 2 1.21 0.413
5 Vanilla 1 1 2000 1 0.554 0.0567
6 Vanilla 1 1 20000 1 4.36 0.296
7 Vanilla 1 8 2000 2 0.82 0.0896
8 1 1 4 2000 2 0.936 0.0743
9 1 1 2 2000 2 0.967 0.101
10 1 1 4 2000 2 0.937 0.0693
11 1 1 1 2000 2 1.27 0.0501
12 1 1 1 2000 1 0.678 0.0833
13 1 1 1 20000 1 5.34 0.804
14 1 1 8 2000 2 0.962 0.0777
15 4 4 4 2000 2 3.63 0.547
16 4 4 2 2000 2 3.54 0.434
17 4 1 4 2000 2 0.99 0.0883
18 4 4 1 2000 2 5.4 0.576
19 4 4 1 2000 1 2.68 0.216
20 4 4 1 20000 1 21.1 1.11
21 4 4 8 2000 2 3.27 0.274
22 8 8 4 2000 2 3.28 0.456
23 8 8 2 2000 2 3.43 0.263
24 8 1 4 2000 2 0.865 0.13
25 8 8 1 2000 2 4.91 0.434
26 8 8 1 2000 1 2.52 0.314
27 8 8 1 20000 1 22.8 4.12
28 8 8 8 2000 2 3.31 0.341
  • Operating System: CentOS Linux release 7.2.1511
  • Python Version: 3.6.5
  • PyStan Version: 2.18.0
  • Compiler/Toolkit: gcc (GCC) 6.3.1 20170216 (Red Hat 6.3.1-3)

Do you see performance increase with CmdStan? If not then this is actually modeling question rather than PyStan.

For the performance thing, make sure your problem is large enough (long/wide) to see the benefit.

For the performance thing, make sure your problem is large enough (long/wide) to see the benefit.

I changed models to first order linear regression and data to simulated one as another experiment.
After that, I could shorten the computation time.
As you pointed out, large size of data seems to be required to work with threading.

Thanks!

Haven’t checked the notebook above but the table doesn’t mention it: In general I also see quite some dependence of the sampling run time of mpi_rect-based Stan models on the way the data is split into shards(i.e. number of shards). There is at least one thread in this forum providing examples of this kind…

Hi ermeel,

Based on your advice, I found out McElreath’s example. I’ll change the models and data with it.

Thanks!

1 Like