I’m using map_rect to do within-chain parallelization on an HPC (linux system). First, I’m trying to run 2 chains with 2 cores, using the following setting for Rstan:
However, by checking the “top” window of my HPC, my Rstan model seems to spawn >900 threads, and most threads are sleeping. So my model runs very slow on HPC.
Could you advise how to correctly set up the multi-threading of map_rect.
I also noticed several posts mentioning CmdStanR. Is it good to change from RStan to CmdStanR for better and easier within-chain parallelization?
After compilation, I received the following warning messages:
Compiling Stan program...
--- Translating Stan model to C++ code ---
bin/stanc.exe --warn-pedantic --name='baye_seroFit_demo_model' --o=C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.hpp C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.stan
Warning:
The parameter logQS_relaRho_prewin_vec has no priors.
Warning:
The parameter logQS_rho_K2nd_lastwin_vec has no priors.
Warning:
The parameter log_crossProtect has no priors.
Warning:
The parameter log_relaFOI_samp_Mat has no priors.
Warning:
The parameter log_sumFOI_vec has no priors.
Warning at 'C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.stan', line 362, column 4 to column 15:
The variable partial_sum may not have been assigned a value before its use.
Warning:
The parameter logQS_relaRho_prewin_vec has no priors.
Warning:
The parameter logQS_rho_K2nd_lastwin_vec has no priors.
Warning:
The parameter log_crossProtect has no priors.
Warning:
The parameter log_relaFOI_samp_Mat has no priors.
Warning:
The parameter log_sumFOI_vec has no priors.
Warning at 'C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.stan', line 362, column 4 to column 15:
The variable partial_sum may not have been assigned a value before its use.
--- Compiling, linking C++ code ---
g++ -std=c++1y -m64 -D_REENTRANT -Wall -Wno-unused-function -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-variable -Wno-sign-compare -Wno-unused-local-typedefs -Wno-int-in-bool-context -Wno-attributes -Wno-ignored-attributes -DSTAN_THREADS -I stan/lib/stan_math/lib/tbb_2019_U8/include -O3 -I src -I stan/src -I lib/rapidjson_1.1.0/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.7 -I stan/lib/stan_math/lib/boost_1.72.0 -I stan/lib/stan_math/lib/sundials_5.2.0/include -D_USE_MATH_DEFINES -DBOOST_DISABLE_ASSERTS -c -Wno-ignored-attributes -x c++ -o C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.o C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.hpp
g++ -std=c++1y -m64 -D_REENTRANT -Wall -Wno-unused-function -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-variable -Wno-sign-compare -Wno-unused-local-typedefs -Wno-int-in-bool-context -Wno-attributes -Wno-ignored-attributes -DSTAN_THREADS -I stan/lib/stan_math/lib/tbb_2019_U8/include -O3 -I src -I stan/src -I lib/rapidjson_1.1.0/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.7 -I stan/lib/stan_math/lib/boost_1.72.0 -I stan/lib/stan_math/lib/sundials_5.2.0/include -D_USE_MATH_DEFINES -DBOOST_DISABLE_ASSERTS -Wl,-L,"C:/Users/Lin/Documents/.cmdstanr/cmdstan-2.25.0/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"C:/Users/Lin/Documents/.cmdstanr/cmdstan-2.25.0/stan/lib/stan_math/lib/tbb" C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.o src/cmdstan/main_threads.o -static-libgcc -static-libstdc++ stan/lib/stan_math/lib/sundials_5.2.0/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_5.2.0/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_5.2.0/lib/libsundials_idas.a stan/lib/stan_math/lib/sundials_5.2.0/lib/libsundials_kinsol.a stan/lib/stan_math/lib/tbb/tbb.dll -o C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.exe
rm -f C:/Users/Lin/AppData/Local/Temp/Rtmpk7MNNq/model-2a2421ee41b8.o
in which the warning The variable partial_sum may not have been assigned a value before its use. makes me confused and worried. Is this warning due to the option pedantic = TRUE within cmdstan_model? Could we ignore it if the sampling can run?
In the meantime, from the webpage Compile a Stan program, there may lack a full list of name-value pairs for cpp_options and stanc_options. I briefly searched online but find it seems not easy to identify a page to solve this issue. Could you give some suggestions?
I’m trying to run my model on this HPC cluster. We have several options for computing resources:
CPU cluster. Each node has 32 or 56 CPU cores (2.6GHz).
KNL cluster. Each node has 256 logical CPUs (1.30GHz).
GPU cluster. Each node has 4 NVIDIA P100 GPUs.
Using a single thread on this CPU cluster is pathologically slow, so I’m trying reduce_sum for within-chain parallelization. To set up reduce_sum on CPU cluster, do you think if we only need to add stan_threads = TRUE within cpp_options?
Responding on my phone so apologize for bad formatting. The model compiles fine, the warnings are from the pedantic mode so the partial_sum warning is a bug in pedantic mode. So just ignore it, if it annoys you remove the warn-pedantic from stanc_options.
This lack is intentional as this is documented elsewhere and duplicating it is redundant and just leads to having two places for the same doc and eventually outdated docs. We do need to link to that docs a bit better.
The full list of name-value pairs for cpp_options is anything you can place in the make/local file, most importantly STAN_THREADS, STAN_MPI, … you can also place cxxflags, ldlflags here if needed.
So bottom line is just compile with STAN_THREADS=true in cpp_options and supply threads_per_chain=X to the $sample() call, where X is the number of threads.
Compiling with cpp_options = list(STAN_THREADS = T) will throw out a warning 'threads_per_chain' is set but the model was not compiled with 'cpp_options = list(stan_threads = TRUE)' so 'threads_per_chain' will have no effect! But multi-threading seems working.
Compiling with cpp_options = list(stan_threads = TRUE) will not give this warning. And the executable file will have a suffix “_threads”
This might be a false positive warning in cmdstanr. Will take a look. You can make sure multi-threading worked by checking fit$metadata(). It should have a num_threads element.