Cmdstanpy: multithreading issues (threads_per_chain)

Hi all,

I’ve been trying to run a stan model using cmdstanpy (it’s my first time using cmdstanpy) with 4 chains of 16 cores, I want my chains to be executed in parallel (cpp_options={'STAN_THREADS': 'TRUE'} is specified and this works). Ideally, I want each chain to use 4 cores, but when I specify threads_per_chain=4' in sample() and run my code, it only uses 4 cores (checked in activity monitor (Mac) and with htop (Linux)) when I would want it to use 16 cores. I have tried specifying os.environ[“STAN_NUM_THREADS”] = str(16)` but this does not seem to have any effect.

currently my code looks like this

nc=4
threads=4

model = CmdStanModel(stan_file=stan_code, cpp_options={'STAN_THREADS': 'TRUE', force_compile=True)

fit = model.sample(data=d, threads_per_chain=threads, chains=nc, iter_warmup=nw, 
                   iter_sampling=ni, seed=305, thin=thin,)
  • Operating System: macOS venture 13.5.1 and Linux 4.18.0-477.27.1.el8_8.x86_64 x86_64
  • CmdStan Version: cmdstanpy= 1.2.0, cmdstan=2.31.0

Help would be appreciated

threads_per_chain - cf API Reference — CmdStanPy 1.2.0 documentation

The number of threads to use in parallelized sections within an MCMC chain (e.g., when using the Stan functionsreduce_sum() or map_rect() ).

are either map_rect or reduce_sum being used?

1 Like

No, I did not added map_rect() to my stan code and it seems to work now :)

2 Likes