Chain parallelization with Stan in Slurm

Hi, I read this post about chain parallelization in stan on a server with a SLURM scheduler system.

I have a model that uses map_rect to speed up computation and I would like to run 3 chains in parallel. I read this answer to the previous post

but I was wondering… If I am using map_rect should I specify something like

#SBATCH --nodes 1
#SBATCH --ntasks-per-node 3
#SBATCH --cpus-per-task 24

with a number of tasks that is equal to the number of chains that I want to run in parallel, and giving more than 1 CPU per chain so to take advantage of map_rect?

Thanks in advance

I would like to add that I am using pyStan3. I am not sure how to make chains run in parallel and if I should specify how many cores per chain should be used.

Should I set some Stan environmental variable like STAN_NUM_THREADS ?
I think from the documentation it is not clear how chain parallelization should be done in pystan3.

One thing you can do is just fire up an instance of PyStan in different processes. That duplicates data memory compared to multi-threaded, but I’m not sure if PyStan has caught up to the internal threaded multiple chain of our C++ processes.

You might also want to look into cmdstanpy, which runs Stan out of process and communicates via I/O.