Optimal num_stan_threads when using multiple chains

eee · May 30, 2019, 12:06pm

Thought i was having the same issue. Running Stan in Docker (DigitalOcean) with 6 vCPU (Intel Xeon Gold 6140 @ 2300Hz, 25MB cache). Was hopping to have 16 vCPUs, but even 6 doesnt get more than 66% CPU utilization.

I made sure i had the correct Makevars file:
CXXFLAGS=-O3 -mtune=native -march=native -Wno-unused-variable -Wno-unused-function -flto -ffat-lto-objects -Wno-unused-local-typedefs -Wno-ignored-attributes -Wno-deprecated-declarations in $HOME/.R/Makevars
and
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
Sys.setenv(LOCAL_CPPFLAGS = ‘-march=native’)

I get 4 chains/4 cpu at 100%, but 2 cpu’s totally unused.

This seems to be a limitation of Stan as noticed by @Mike_Terrell :

It would save a lot of time if we could split 1 chain over 2 cpu’s (or even 3) to hopefully split the sampling time. But as @Bob_Carpenter points out:

Topic		Replies	Views
Map_rect, rstan, and multiple chains RStan rstan	3	897	January 22, 2019
Map_rect spawns too many threads than requested Modeling rstan , performance	13	807	January 25, 2021
Map_rect concurrent about to land Developers math	34	2118	July 23, 2018
Map-Reduce examples? Modeling	5	1273	March 2, 2019
Performance of Parallel Computation with map_rec() General	3	514	September 20, 2019

Optimal num_stan_threads when using multiple chains

Related topics