Map_rect spawns too many threads than requested

lin.wang.idd.pasteur · January 19, 2021, 4:43pm

I’m using map_rect to do within-chain parallelization on an HPC (linux system). First, I’m trying to run 2 chains with 2 cores, using the following setting for Rstan:

options(mc.cores = 8);
stan( chains = 2, cores = 2, ... );

Then, in the slurm script, I requested 10 cpus as below:

#SBATCH --nodes=1
#SBATCH --cpus-per-task=10
#SBATCH --ntasks=1

However, by checking the “top” window of my HPC, my Rstan model seems to spawn >900 threads, and most threads are sleeping. So my model runs very slow on HPC.

Could you advise how to correctly set up the multi-threading of map_rect.

I also noticed several posts mentioning CmdStanR. Is it good to change from RStan to CmdStanR for better and easier within-chain parallelization?

wds15 · January 19, 2021, 5:01pm

Yes, it’s better to use cmdstanr for this and also use the latest cmdstan.

lin.wang.idd.pasteur · January 22, 2021, 7:01pm

Could we simply ignore the below warning message of reduce_sum if the code can run?

The variable partial_sum may not have been assigned a value before its use.

wds15 · January 22, 2021, 8:04pm

Can u post more information please? This sounds odd to me.

lin.wang.idd.pasteur · January 24, 2021, 11:24am

Thanks for responding @wds15! In R 4.0.3, I used the below script to compile my stan model:

library(cmdstanr)
mod_file = file.path(getwd(), "baye_seroFit_demo.stan", fsep = .Platform$file.sep)

mod <- cmdstan_model( stan_file = mod_file, compile = T, quiet = F, dir = getwd(), pedantic = TRUE, cpp_options = list(stan_threads = TRUE) )

After compilation, I received the following warning messages:

Compiling Stan program...

--- Translating Stan model to C++ code ---
bin/stanc.exe --warn-pedantic --name='baye_seroFit_demo_model' --o=C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.hpp C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.stan
Warning:
  The parameter logQS_relaRho_prewin_vec has no priors.
Warning:
  The parameter logQS_rho_K2nd_lastwin_vec has no priors.
Warning:
  The parameter log_crossProtect has no priors.
Warning:
  The parameter log_relaFOI_samp_Mat has no priors.
Warning:
  The parameter log_sumFOI_vec has no priors.
Warning at 'C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.stan', line 362, column 4 to column 15:
  The variable partial_sum may not have been assigned a value before its use.
Warning:
  The parameter logQS_relaRho_prewin_vec has no priors.
Warning:
  The parameter logQS_rho_K2nd_lastwin_vec has no priors.
Warning:
  The parameter log_crossProtect has no priors.
Warning:
  The parameter log_relaFOI_samp_Mat has no priors.
Warning:
  The parameter log_sumFOI_vec has no priors.
Warning at 'C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.stan', line 362, column 4 to column 15:
  The variable partial_sum may not have been assigned a value before its use.

--- Compiling, linking C++ code ---
g++ -std=c++1y -m64 -D_REENTRANT -Wall -Wno-unused-function -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-variable -Wno-sign-compare -Wno-unused-local-typedefs -Wno-int-in-bool-context -Wno-attributes -Wno-ignored-attributes     -DSTAN_THREADS -I stan/lib/stan_math/lib/tbb_2019_U8/include   -O3 -I src -I stan/src -I lib/rapidjson_1.1.0/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.7 -I stan/lib/stan_math/lib/boost_1.72.0 -I stan/lib/stan_math/lib/sundials_5.2.0/include  -D_USE_MATH_DEFINES  -DBOOST_DISABLE_ASSERTS        -c -Wno-ignored-attributes   -x c++ -o C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.o C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.hpp
g++ -std=c++1y -m64 -D_REENTRANT -Wall -Wno-unused-function -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-variable -Wno-sign-compare -Wno-unused-local-typedefs -Wno-int-in-bool-context -Wno-attributes -Wno-ignored-attributes     -DSTAN_THREADS -I stan/lib/stan_math/lib/tbb_2019_U8/include   -O3 -I src -I stan/src -I lib/rapidjson_1.1.0/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.7 -I stan/lib/stan_math/lib/boost_1.72.0 -I stan/lib/stan_math/lib/sundials_5.2.0/include  -D_USE_MATH_DEFINES  -DBOOST_DISABLE_ASSERTS              -Wl,-L,"C:/Users/Lin/Documents/.cmdstanr/cmdstan-2.25.0/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"C:/Users/Lin/Documents/.cmdstanr/cmdstan-2.25.0/stan/lib/stan_math/lib/tbb"      C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.o src/cmdstan/main_threads.o  -static-libgcc -static-libstdc++       stan/lib/stan_math/lib/sundials_5.2.0/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_5.2.0/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_5.2.0/lib/libsundials_idas.a stan/lib/stan_math/lib/sundials_5.2.0/lib/libsundials_kinsol.a  stan/lib/stan_math/lib/tbb/tbb.dll -o C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.exe
rm -f C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.o

in which the warning The variable partial_sum may not have been assigned a value before its use. makes me confused and worried. Is this warning due to the option pedantic = TRUE within cmdstan_model? Could we ignore it if the sampling can run?

wds15 · January 24, 2021, 11:52am

That is due to the pedantic mode as it looks. Quickly skimming over it, I don’t see an issue with the partial_sum.

lin.wang.idd.pasteur · January 24, 2021, 11:57am

In the meantime, from the webpage Compile a Stan program, there may lack a full list of name-value pairs for cpp_options and stanc_options. I briefly searched online but find it seems not easy to identify a page to solve this issue. Could you give some suggestions?

I’m trying to run my model on this HPC cluster. We have several options for computing resources:

CPU cluster. Each node has 32 or 56 CPU cores (2.6GHz).
KNL cluster. Each node has 256 logical CPUs (1.30GHz).
GPU cluster. Each node has 4 NVIDIA P100 GPUs.

Using a single thread on this CPU cluster is pathologically slow, so I’m trying reduce_sum for within-chain parallelization. To set up reduce_sum on CPU cluster, do you think if we only need to add stan_threads = TRUE within cpp_options?

wds15 · January 24, 2021, 12:02pm

@rok_cesnovar …can you help here?

rok_cesnovar · January 24, 2021, 12:27pm

Thanks for tagging, missed this.

Responding on my phone so apologize for bad formatting. The model compiles fine, the warnings are from the pedantic mode so the partial_sum warning is a bug in pedantic mode. So just ignore it, if it annoys you remove the warn-pedantic from stanc_options.

This lack is intentional as this is documented elsewhere and duplicating it is redundant and just leads to having two places for the same doc and eventually outdated docs. We do need to link to that docs a bit better.

The full list of name-value pairs for cpp_options is anything you can place in the make/local file, most importantly STAN_THREADS, STAN_MPI, … you can also place cxxflags, ldlflags here if needed.

the full list of stanc_options is all argumenta for the stan-to-c++ parser (stanc3). See a list in 15.3 here: 15 stanc: Translating Stan to C++ | CmdStan User’s Guide

rok_cesnovar · January 24, 2021, 12:31pm

So bottom line is just compile with STAN_THREADS=true in cpp_options and supply threads_per_chain=X to the $sample() call, where X is the number of threads.

Also see Reduce Sum: A Minimal Example case study for reduce_sum by @wds15 and @bbbales2
That also uses cmdstanr so you can just replicate it.

lin.wang.idd.pasteur · January 24, 2021, 12:50pm

Thanks much for the suggestion. Aha, then I can go with that warning and test on cluster now!

lin.wang.idd.pasteur · January 25, 2021, 12:56pm

Compiling with cpp_options = list(STAN_THREADS = T) will throw out a warning 'threads_per_chain' is set but the model was not compiled with 'cpp_options = list(stan_threads = TRUE)' so 'threads_per_chain' will have no effect! But multi-threading seems working.

Compiling with cpp_options = list(stan_threads = TRUE) will not give this warning. And the executable file will have a suffix “_threads”

rok_cesnovar · January 25, 2021, 12:58pm

This might be a false positive warning in cmdstanr. Will take a look. You can make sure multi-threading worked by checking fit$metadata(). It should have a num_threads element.

lin.wang.idd.pasteur · January 25, 2021, 1:27pm

cpp_options = list(STAN_THREADS = T) and cpp_options = list(STAN_THREADS = TRUE) both give

$threads_per_chain
[1] 1
$num_thread
[1] 1

cpp_options = list(stan_threads = TRUE) gives

$threads_per_chain
[1] 31
$num_thread
[1] 31

Topic		Replies	Views
Optimal num_stan_threads when using multiple chains General performance	5	1907	May 30, 2019
Map_rect, rstan, and multiple chains RStan rstan	3	901	January 22, 2019
Multithreading with map_rect takes more time Modeling	7	686	July 27, 2020
Map-Reduce examples? Modeling	5	1273	March 2, 2019
Threading in rstan 2.18 General	30	4149	March 26, 2020

Map_rect spawns too many threads than requested

Related topics