I am trying to run my Stan model with multithreading, but I saw that it’s a bit complicated with pystan2.18+. From this and other forums, seems like official multithreading is introduced with pystan3 (still Beta version), so I’ve decided to try it. Since my model is really really big, my goal is to parallelize each chain and to submit each chain to multiple cpus. However I couldn’t find anywhere how to implement it in pystan3.
Can someone here point me to the right direction?

That’s my goal and I can allocate a generous number of cpus for each chain. In pystan2.19, even if I follow this example, I still have only 4 cpus running (1 per chain). So something doesn’t work right…
Update - I use pystan on CentOS7 linux server.

ok thanks! One last question - from reading more I saw that some folks suggest to increase the number of chains and to reduce the number of iterations in each chain to speed up. Is it really recommended?

I hope someone will pick it up - my really large model is stuck in the first set of warmup iterations for 8 hours. Any ideas what it means and how can I solve it?

Thanks, Ari!
It’s not huge in parameters - I am fitting ~40 parameters. But the data is pretty large - it’s ~1200 observations (including missing data), each with ~10 different tasks with hundreds of trials each. So I guess that’s a lot.
Now the weird part is that in the original model I am looping over trials in each task and to make it more efficient, I vectorized it - BUT it takes more time after vectorization… hmmmm.

Gradient evaluation took 0.39 seconds
1000 transitions using 10 leapfrog steps per transition would take 3900 seconds.
Adjust your expectations accordingly!

I wouldn’t even know where to begin - it’s a 600 lines code… I’ll start with re-parametrizing it and with implementing the within-chain parallelization. Thanks a lot!