I never understood what n_jobs does. Suppose I have N chains and I want M cores for each chain which will be run parallelly (assume NM<num of cores I have on one node). I am seeing if I set n_jobs>=NM, nothing interesting happens, rather N chains run parallelly on just one node each (not on M nodes). What should I set?
os: tested on ubuntu 18.04 or mac catalina both
n_jobs sets parallel processes.
You want to use threading for in-chain parallel stuff. See the docs https://pystan.readthedocs.io/en/latest/threading_support.html
This example surely did not work. Anyway I fail to understand your answer.
what example did not work?
What did you fail to understand? What n_jobs do?
You can try to run multiple chains and test different n_jobs.
For example, what do you see when you run 30 chains with n_jobs 1, 2, 4, --> How many chains run simultaneously?
There is an example in the link you referred to earlier.
I asked a specific question, did not understand your answer specific to the question. I will rephrase the question. I want 4 chains, also I want that each chain runs on 2 cores requiring a total of 4*2=8 cores. My cpu has 12 cores. Now my questions are as follow:
" I want 4 chains, also I want that each chain runs on 2 cores requiring a total of 4*2=8 cores." Is it doable in pystan?
Following the example given in the link you shared I set :
os.environ[‘STAN_NUM_THREADS’] = “2”
fit4 = sm4.sampling(data=data, iter=steps, chains=5, warmup=burn, thin=thin_by, seed=random_seed,n_jobs=10)
Still I am seeing that 5 parallel chains are running, each chain is using 1 core; so total 5 cores are being used!
Instead, if I had set
I would see same result.
Like I said, n_jobs=min(nchain, cpu_cores), so in your case you have 5 chains and n_jobs = 10 --> 5 cores is used.
If you set n_jobs=1, then chains are processed serially --> one chain is sampled, then another one is sampled.
If you want to use threading, you also need to specify
extra_compiler_args shown in the example. (for threading, you need to use
I am not sure if threading depends on the problem on hand. When I take the example code, run it I found that all of the cores (here I define cores= no of physical core (6) * 2 threads per core=12) are being used with moderate load on each core. This happens even if I set os.environ[‘STAN_NUM_THREADS’] = “2” and chains=4,n_jobs=4 .
Now If I do the same thing on my problem (which involves far more free variable than the above example), I see only 4 cores are being used with full load (100%) on each core. I don’t think I understood anything from this exercise.
Is your function using map_rect? If not, then enabling threading will not help.
Found my problem. Thanks a lot. Its not easy to implement map_rect right away. I will do it in some spare time.