Could not understand what pystan n_jobs actually does

I never understood what n_jobs does. Suppose I have N chains and I want M cores for each chain which will be run parallelly (assume NM<num of cores I have on one node). I am seeing if I set n_jobs>=NM, nothing interesting happens, rather N chains run parallelly on just one node each (not on M nodes). What should I set?

os: tested on ubuntu 18.04 or mac catalina both

n_jobs sets parallel processes.

min(chains, cpu_count)

You want to use threading for in-chain parallel stuff. See the docs https://pystan.readthedocs.io/en/latest/threading_support.html

This example surely did not work. Anyway I fail to understand your answer.

Hi,

what example did not work?

What did you fail to understand? What n_jobs do?

You can try to run multiple chains and test different n_jobs.

For example, what do you see when you run 30 chains with n_jobs 1, 2, 4, --> How many chains run simultaneously?

There is an example in the link you referred to earlier.
I asked a specific question, did not understand your answer specific to the question. I will rephrase the question. I want 4 chains, also I want that each chain runs on 2 cores requiring a total of 4*2=8 cores. My cpu has 12 cores. Now my questions are as follow:

  1. " I want 4 chains, also I want that each chain runs on 2 cores requiring a total of 4*2=8 cores." Is it doable in pystan?

  2. Following the example given in the link you shared I set :

os.environ[‘STAN_NUM_THREADS’] = “2”
fit4 = sm4.sampling(data=data, iter=steps, chains=5, warmup=burn, thin=thin_by, seed=random_seed,n_jobs=10)

Still I am seeing that 5 parallel chains are running, each chain is using 1 core; so total 5 cores are being used!

Instead, if I had set

n_jobs=1

I would see same result.

Like I said, n_jobs=min(nchain, cpu_cores), so in your case you have 5 chains and n_jobs = 10 --> 5 cores is used.

If you set n_jobs=1, then chains are processed serially --> one chain is sampled, then another one is sampled.

If you want to use threading, you also need to specify extra_compiler_args shown in the example. (for threading, you need to use map_rect function)

Like I said, n_jobs=min(nchain, cpu_cores), so in your case you have 5 chains and n_jobs = 10 → 5 cores is used.
If you set n_jobs=1, then chains are processed serially → one chain is sampled, then another one is sampled.

Understood.

If you want to use threading, you also need to specify extra_compiler_args shown in the example. (for threading, you need to use map_rect function)

Which example?

Threading example here https://pystan.readthedocs.io/en/latest/threading_support.html

I am not sure if threading depends on the problem on hand. When I take the example code, run it I found that all of the cores (here I define cores= no of physical core (6) * 2 threads per core=12) are being used with moderate load on each core. This happens even if I set os.environ[‘STAN_NUM_THREADS’] = “2” and chains=4,n_jobs=4 .

Now If I do the same thing on my problem (which involves far more free variable than the above example), I see only 4 cores are being used with full load (100%) on each core. I don’t think I understood anything from this exercise.

Is your function using map_rect? If not, then enabling threading will not help.

Found my problem. Thanks a lot. Its not easy to implement map_rect right away. I will do it in some spare time.