Chain Parallelization in Stan


I read here that within-chain parallelization is not supported in Stan due to the mentioned reasons. However, when I run my inferences in Stan I often notice that multiple different chains are getting processed concurrently. Does this mean Stan supports parallelization between chains? If yes, is this done automatically, or does the user need to add additional arguments during building the model or compiling the model?

I’m asking because I’d like to parallelize my code among the chains if possible. I’m running on a cluster, which is based on Slurm. I’m essentially wondering if I can be as relaxed as simply issuing the Slurm flag --cpus-per-tasks=10 and have Stan take care of the parallelizng the chains, or I need to do something more involved.

I think this is somewhat outdated. IIRC, map_rect handles within-chain parallelism.

It’s not so much parallelism between chains, since each chain is independent of the others. There’s no communication between the chains.

It’s a little more involved, but not by that much. If you aren’t bothering with map_rect, then you can tell your Stan implementation to create as many chains as you have cores available. That may involve, say, passing $SLURM_CPUS_ON_NODE to Stan somehow, which will depend on which implementation you use, e.g. RStan, PyStan, CmdStan, etc.

Oh I see, thank you. Instead of passing slurm flags, can’t I just use 4 chains in Stan and request 4 CPUs from slurm?

What I want to achieve is 4 chains running concurrently on 4 CPUs.

Sure, that would work. The one catch is that you’d have to manually make sure that Slurm and Stan were using the same number of CPUs.

Yeah of course! But I don’t plan to alter that value often. So, for my own edification, Stan automatically recognizes the CPU resources available to it and distributes independent chain processes across them, with a 1-to-1 correspondence? That’s quite amazing!

I think it’s actually the underlying OS that does that. AFAIK, the Stan implementations RStan and PyStan can put each chain on its own thread, and it’s the OS’s responsibility to figure out what threads go on what CPU cores. Typically, though, there will be 1 chain per core in practice.

I see, thanks for that. Last question: I should be requesting to run on the parallel partition of slurm with cpus-per-task=4 right?

I’m not sure, since I have no experience with slurm, myself. I presume that it’s similar to other cluster schedulers I’ve used, i.e., I write a job script that uses special comments and environment variables and is submitted to the scheduler via some command-line utility.

I’m not sure why this got flagged, but I’m guessing it’s by accident. It shows up in my to-review and I’m not sure whether giving a thumbs-up agrees with flag or a thumbs-down agrees with flag, so I’m afraid to touch it.


Was this an inappropriate or spam question? It was meant for my education and I posted under general so to not take anyone’s time immediately.

1 Like

Idt so at all!

From the below stackoverflow, assuming you are calling parallelism with rstan or pystan

sbatch --ntasks 1 --cpus-per-task 24 #script_here

Though I haven’t used slurm in a long time

1 Like

No. We’re OK with any kind of question.