I’m trying to make sense out of below results and figure out how to deal with STAN_NUM_THREADS and num_threads in my StanSample.jl Julia package.
My questions:
-
Is it correct that if the STANS_NUM_THREADS environment variable is not set, when the command line argument num_threads is used it will still report NUM_THREADS=1 in the .csv files, e.g.
fitzhughnagumo num_threads=4 sample ...
? -
Currently in StanSample.jl STAN_NUM_THREADS is used if defined. For a set of tests I changed that to let STAN_NUM_THREADS (“SNT” in below table) follow num_threads. In below tables STAN_NUM_THREADS and num_threads (“threads” in the table) are always identical. This is also reflected in the chain .csv files NUM_THREADS value. Is this a correct way to manipulate STAN_NUM_THREADS?
-
In below table I draw 10000 samples in either a single chain, 2500 samples in 4 chains or 1250 samples in 8 chains, combine the chains in a single DataFrame and record 4 elapsed times and compute the mean of these elapsed times. I believe this is a correct way to obtain 10000 draws?
-
The results in the table are interesting . I had expected cmdstan needs to provide threads (or maybe Julia?). But that seems not to be the case. My conclusion is I get the benefits of Julia’s
run()
command (which runs as julia’s immediate child process, using fork and exec calls). -
If that is the case would this be an option internally in Stan’s cmdstan?
Any feedback is highly appreciated. The current test is an example in DiffEqBayesStan.jl. I also want to use the red cards data and do a similar set of tests and also later on check some of this on the Julia forum.
-------------------------- TABLE --------------------
JULIA_NUM_THREADS = 10
9×9 DataFrame
Row │ SNT threads chains samples time_1 time_2 time_3 time_4 mean
1 │ 1 1 1 10000 42.3248 36.6654 39.8239 37.3989 39.0532
2 │ 1 1 4 10000 16.8682 14.8487 14.0331 14.8553 15.1513
3 │ 1 1 8 10000 9.83072 9.72722 9.10842 11.134 9.95008
4 │ 4 4 1 10000 36.1733 42.266 34.3276 39.0783 37.9613
5 │ 4 4 4 10000 15.3396 16.7139 13.6394 14.3342 15.0068
6 │ 4 4 8 10000 10.0435 9.98177 9.82626 10.246 10.0244
7 │ 8 8 1 10000 48.2951 34.1351 38.4091 35.1433 38.9956
8 │ 8 8 4 10000 16.6352 14.9329 13.9775 15.2577 15.2008
9 │ 8 8 8 10000 9.74233 10.0634 9.70536 10.0111 9.88054
JULIA_NUM_THREADS = 4
9×9 DataFrame
Row │ SNT threads chains samples time_1 time_2 time_3 time_4 mean
1 │ 1 1 1 10000 38.8417 34.8116 40.922 35.1548 37.4325
2 │ 1 1 4 10000 14.3087 14.1105 14.6233 14.6406 14.4208
3 │ 1 1 8 10000 10.0635 10.2226 10.6875 10.2459 10.3049
4 │ 4 4 1 10000 36.8625 52.1193 40.957 40.2237 42.5406
5 │ 4 4 4 10000 13.7354 14.5204 17.5627 14.3537 15.0431
6 │ 4 4 8 10000 9.50104 10.3231 10.7328 11.1363 10.4233
7 │ 8 8 1 10000 42.508 43.9822 34.1056 34.2806 38.7191
8 │ 8 8 4 10000 16.2068 14.4028 15.0074 16.6844 15.5754
9 │ 8 8 8 10000 10.2654 10.8903 9.45841 10.4184 10.2581
JULIA_NUM_THREADS = 1
9×9 DataFrame
Row │ SNT threads chains samples time_1 time_2 time_3 time_4 mean
1 │ 1 1 1 10000 31.6349 32.2269 39.5003 33.8465 34.3021
2 │ 1 1 4 10000 13.1042 12.5378 12.0559 12.1937 12.4729
3 │ 1 1 8 10000 8.65819 8.85462 8.29159 8.5041 8.57712
4 │ 4 4 1 10000 32.9886 38.8642 36.1392 41.9363 37.4821
5 │ 4 4 4 10000 12.4424 13.13 12.3069 12.0543 12.4834
6 │ 4 4 8 10000 9.39871 9.1411 8.82557 8.17123 8.88415
7 │ 8 8 1 10000 38.5501 36.1817 33.7262 33.6519 35.5275
8 │ 8 8 4 10000 12.1651 14.5109 14.3144 12.7943 13.4462
9 │ 8 8 8 10000 9.12816 8.51613 8.50016 8.5085 8.66324