Cmdstanr doesn't use mutltiple compute threads for each core

I am running a Windows machine with 8 cores and 2 threads/core for a total of 16 compute threads.

> library(parallel)
> detectCores()
[1] 16

I have followed the Stan User Manual instructions and in Windows Powershell set:

Set-Variable -Name STAN_NUM_THREADS -Value "16"

Yet, when I estimate a within-chain parallelized model using reduce_sum(), with the following command,

mod_to_run    <- cmdstan_model('mod.stan',
                                cpp_options = list(stan_threads = TRUE))
Sest           <- mod_to_run$sample(data = standata_Si, init = 1.0, 
                           chains = 1, iter_sampling = 1500, iter_warmup = 1500, thin = 2,
                           adapt_delta = 0.98, max_treedepth=13, threads_per_chain =  8)

The Windows task manager indicates 100% of my processor is being used, indicating to me that the 8 cores are recognized, but not the 16 threads.

Would you be able to clarify what the issue here is? Is it that Stan is not using 16 threads, or is it that you want to use multiple threads in a single core rather than equally distributing them amongst the cores (i.e., 4 cores using 2 threads each, rather than 8 cores using 1 thread each)?

Hi and thanks for your reply. I want to be able to use up to 16 threads, but cmdstan is not recognizing that I have 16 threads. If I chose 8 or more threads_per_chain the estimation will use 100% of my CPU. I would expect that at threads_per_chain = 8 that my CPU usage would be 50%.

That’s more of a quirk with Window’s CPU monitoring than Stan’s threading. If the task manager were monitoring on a truly per-thread scale, than no single process could exceed 100% / 16 = 6.25%. You can see this with all types of parallelism in your system, including other R functions.

I’m not sure. If I open a single R session and run it without invoking multithreading what you say is exactly what I would see for the CPU usage of that R session. In the case of enabling multithreading I would expect the CPU usage for that R session to be at 50% if I’m using 8 threads (of the 16 total), rather than the 100% I see on task manager. It is especially important to note that if I use < 8 threads, the task manager registers CPU usage proportionally correctly (e.g., 75% for 6 cores).

A key thing to keep in mind is that hyperthreading is essentially just a mechanism for a single CPU core to work on two tasks at the same time, by intelligently ‘switching’ between them. It’s not really an additional core.

In the event that there is only one task for a given core, you would want the entirety of that core to work on it, rather than limiting itself to ‘reserve’ cpu cycles just in case there was another task.

Essentially, the task manager is reporting 100% because every core is entirely focused on it’s given task. If you were to increase to 16 threads, then each core is then switching back and forth between the tasks for each thread