Map_rect spawns too many threads than requested

I’m using map_rect to do within-chain parallelization on an HPC (linux system). First, I’m trying to run 2 chains with 2 cores, using the following setting for Rstan:

options(mc.cores = 8);
stan( chains = 2, cores = 2, ... );

Then, in the slurm script, I requested 10 cpus as below:

#SBATCH --nodes=1
#SBATCH --cpus-per-task=10
#SBATCH --ntasks=1

However, by checking the “top” window of my HPC, my Rstan model seems to spawn >900 threads, and most threads are sleeping. So my model runs very slow on HPC.

Could you advise how to correctly set up the multi-threading of map_rect.

I also noticed several posts mentioning CmdStanR. Is it good to change from RStan to CmdStanR for better and easier within-chain parallelization?

Yes, it’s better to use cmdstanr for this and also use the latest cmdstan.

1 Like

Could we simply ignore the below warning message of reduce_sum if the code can run?

The variable partial_sum may not have been assigned a value before its use.

Can u post more information please? This sounds odd to me.

Thanks for responding @wds15! In R 4.0.3, I used the below script to compile my stan model:

library(cmdstanr)
mod_file = file.path(getwd(), "baye_seroFit_demo.stan", fsep = .Platform$file.sep)

mod <- cmdstan_model( stan_file = mod_file, compile = T, quiet = F, dir = getwd(), pedantic = TRUE, cpp_options = list(stan_threads = TRUE) )

After compilation, I received the following warning messages:

Compiling Stan program...

--- Translating Stan model to C++ code ---
bin/stanc.exe --warn-pedantic --name='baye_seroFit_demo_model' --o=C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.hpp C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.stan
Warning:
  The parameter logQS_relaRho_prewin_vec has no priors.
Warning:
  The parameter logQS_rho_K2nd_lastwin_vec has no priors.
Warning:
  The parameter log_crossProtect has no priors.
Warning:
  The parameter log_relaFOI_samp_Mat has no priors.
Warning:
  The parameter log_sumFOI_vec has no priors.
Warning at 'C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.stan', line 362, column 4 to column 15:
  The variable partial_sum may not have been assigned a value before its use.
Warning:
  The parameter logQS_relaRho_prewin_vec has no priors.
Warning:
  The parameter logQS_rho_K2nd_lastwin_vec has no priors.
Warning:
  The parameter log_crossProtect has no priors.
Warning:
  The parameter log_relaFOI_samp_Mat has no priors.
Warning:
  The parameter log_sumFOI_vec has no priors.
Warning at 'C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.stan', line 362, column 4 to column 15:
  The variable partial_sum may not have been assigned a value before its use.

--- Compiling, linking C++ code ---
g++ -std=c++1y -m64 -D_REENTRANT -Wall -Wno-unused-function -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-variable -Wno-sign-compare -Wno-unused-local-typedefs -Wno-int-in-bool-context -Wno-attributes -Wno-ignored-attributes     -DSTAN_THREADS -I stan/lib/stan_math/lib/tbb_2019_U8/include   -O3 -I src -I stan/src -I lib/rapidjson_1.1.0/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.7 -I stan/lib/stan_math/lib/boost_1.72.0 -I stan/lib/stan_math/lib/sundials_5.2.0/include  -D_USE_MATH_DEFINES  -DBOOST_DISABLE_ASSERTS        -c -Wno-ignored-attributes   -x c++ -o C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.o C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.hpp
g++ -std=c++1y -m64 -D_REENTRANT -Wall -Wno-unused-function -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-variable -Wno-sign-compare -Wno-unused-local-typedefs -Wno-int-in-bool-context -Wno-attributes -Wno-ignored-attributes     -DSTAN_THREADS -I stan/lib/stan_math/lib/tbb_2019_U8/include   -O3 -I src -I stan/src -I lib/rapidjson_1.1.0/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.7 -I stan/lib/stan_math/lib/boost_1.72.0 -I stan/lib/stan_math/lib/sundials_5.2.0/include  -D_USE_MATH_DEFINES  -DBOOST_DISABLE_ASSERTS              -Wl,-L,"C:/Users/Lin/Documents/.cmdstanr/cmdstan-2.25.0/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"C:/Users/Lin/Documents/.cmdstanr/cmdstan-2.25.0/stan/lib/stan_math/lib/tbb"      C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.o src/cmdstan/main_threads.o  -static-libgcc -static-libstdc++       stan/lib/stan_math/lib/sundials_5.2.0/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_5.2.0/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_5.2.0/lib/libsundials_idas.a stan/lib/stan_math/lib/sundials_5.2.0/lib/libsundials_kinsol.a  stan/lib/stan_math/lib/tbb/tbb.dll -o C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.exe
rm -f C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.o

in which the warning The variable partial_sum may not have been assigned a value before its use. makes me confused and worried. Is this warning due to the option pedantic = TRUE within cmdstan_model? Could we ignore it if the sampling can run?

That is due to the pedantic mode as it looks. Quickly skimming over it, I don’t see an issue with the partial_sum.

1 Like

In the meantime, from the webpage Compile a Stan program, there may lack a full list of name-value pairs for cpp_options and stanc_options. I briefly searched online but find it seems not easy to identify a page to solve this issue. Could you give some suggestions?

I’m trying to run my model on this HPC cluster. We have several options for computing resources:

  • CPU cluster. Each node has 32 or 56 CPU cores (2.6GHz).
  • KNL cluster. Each node has 256 logical CPUs (1.30GHz).
  • GPU cluster. Each node has 4 NVIDIA P100 GPUs.

Using a single thread on this CPU cluster is pathologically slow, so I’m trying reduce_sum for within-chain parallelization. To set up reduce_sum on CPU cluster, do you think if we only need to add stan_threads = TRUE within cpp_options?

@rok_cesnovar …can you help here?

Thanks for tagging, missed this.

Responding on my phone so apologize for bad formatting. The model compiles fine, the warnings are from the pedantic mode so the partial_sum warning is a bug in pedantic mode. So just ignore it, if it annoys you remove the warn-pedantic from stanc_options.

This lack is intentional as this is documented elsewhere and duplicating it is redundant and just leads to having two places for the same doc and eventually outdated docs. We do need to link to that docs a bit better.

The full list of name-value pairs for cpp_options is anything you can place in the make/local file, most importantly STAN_THREADS, STAN_MPI, … you can also place cxxflags, ldlflags here if needed.

the full list of stanc_options is all argumenta for the stan-to-c++ parser (stanc3). See a list in 15.3 here: 15 stanc: Translating Stan to C++ | CmdStan User’s Guide

1 Like

So bottom line is just compile with STAN_THREADS=true in cpp_options and supply threads_per_chain=X to the $sample() call, where X is the number of threads.

Also see Reduce Sum: A Minimal Example case study for reduce_sum by @wds15 and @bbbales2
That also uses cmdstanr so you can just replicate it.

2 Likes

Thanks much for the suggestion. Aha, then I can go with that warning and test on cluster now!

1 Like

Compiling with cpp_options = list(STAN_THREADS = T) will throw out a warning 'threads_per_chain' is set but the model was not compiled with 'cpp_options = list(stan_threads = TRUE)' so 'threads_per_chain' will have no effect! But multi-threading seems working.

Compiling with cpp_options = list(stan_threads = TRUE) will not give this warning. And the executable file will have a suffix “_threads”

This might be a false positive warning in cmdstanr. Will take a look. You can make sure multi-threading worked by checking fit$metadata(). It should have a num_threads element.

cpp_options = list(STAN_THREADS = T) and cpp_options = list(STAN_THREADS = TRUE) both give

$threads_per_chain
[1] 1
$num_thread
[1] 1

cpp_options = list(stan_threads = TRUE) gives

$threads_per_chain
[1] 31
$num_thread
[1] 31
1 Like