How could I speed up sampling with threading?


#1

I’m now trying to run sampling jobs.After I checked the performance of threading, I couldn’t improve the estimation speed.
What is required to speed up sampling?

Here is my experiment results.
And summarized estimation table is below.

index compile num_threads sampling num_threads n_jobs iter chains avarage time std time
1 Vanilla 1 4 2000 2 0.883 0.121
2 Vanilla 1 2 2000 2 0.834 0.0624
3 Vanilla 1 4 2000 2 0.935 0.181
4 Vanilla 1 1 2000 2 1.21 0.413
5 Vanilla 1 1 2000 1 0.554 0.0567
6 Vanilla 1 1 20000 1 4.36 0.296
7 Vanilla 1 8 2000 2 0.82 0.0896
8 1 1 4 2000 2 0.936 0.0743
9 1 1 2 2000 2 0.967 0.101
10 1 1 4 2000 2 0.937 0.0693
11 1 1 1 2000 2 1.27 0.0501
12 1 1 1 2000 1 0.678 0.0833
13 1 1 1 20000 1 5.34 0.804
14 1 1 8 2000 2 0.962 0.0777
15 4 4 4 2000 2 3.63 0.547
16 4 4 2 2000 2 3.54 0.434
17 4 1 4 2000 2 0.99 0.0883
18 4 4 1 2000 2 5.4 0.576
19 4 4 1 2000 1 2.68 0.216
20 4 4 1 20000 1 21.1 1.11
21 4 4 8 2000 2 3.27 0.274
22 8 8 4 2000 2 3.28 0.456
23 8 8 2 2000 2 3.43 0.263
24 8 1 4 2000 2 0.865 0.13
25 8 8 1 2000 2 4.91 0.434
26 8 8 1 2000 1 2.52 0.314
27 8 8 1 20000 1 22.8 4.12
28 8 8 8 2000 2 3.31 0.341
  • Operating System: CentOS Linux release 7.2.1511
  • Python Version: 3.6.5
  • PyStan Version: 2.18.0
  • Compiler/Toolkit: gcc (GCC) 6.3.1 20170216 (Red Hat 6.3.1-3)

#2

Do you see performance increase with CmdStan? If not then this is actually modeling question rather than PyStan.

For the performance thing, make sure your problem is large enough (long/wide) to see the benefit.


#3

For the performance thing, make sure your problem is large enough (long/wide) to see the benefit.

I changed models to first order linear regression and data to simulated one as another experiment.
After that, I could shorten the computation time.
As you pointed out, large size of data seems to be required to work with threading.

Thanks!


#4

Haven’t checked the notebook above but the table doesn’t mention it: In general I also see quite some dependence of the sampling run time of mpi_rect-based Stan models on the way the data is split into shards(i.e. number of shards). There is at least one thread in this forum providing examples of this kind…


#5

Hi ermeel,

Based on your advice, I found out McElreath’s example. I’ll change the models and data with it.

Thanks!