I found that the compiling speed of CmdStanR (0.5.3) with Windows Subsystem for Linux (WSL) is 6 times slower than CmdStanR with (pure) Windows, at least in my environment. Is this reproducible phenomenon in any other Windows machines, or is this specific to me…? Is there any method to speed up the compilation of CmdStanR with WSL? Any ideas are appreciated.
Test case
My R environment
> sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=Japanese_Japan.utf8 LC_CTYPE=Japanese_Japan.utf8
[3] LC_MONETARY=Japanese_Japan.utf8 LC_NUMERIC=C
[5] LC_TIME=Japanese_Japan.utf8
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] cmdstanr_0.5.3
loaded via a namespace (and not attached):
[1] pillar_1.8.0 compiler_4.2.0 tools_4.2.0
[4] jsonlite_1.8.0 lifecycle_1.0.1 tibble_3.1.8
[7] gtable_0.3.0 checkmate_2.1.0 pkgconfig_2.0.3
[10] rlang_1.0.3 cli_3.3.0 DBI_1.1.3
[13] xfun_0.31 withr_2.5.0 dplyr_1.0.9
[16] knitr_1.39 generics_0.1.3 vctrs_0.4.1
[19] tictoc_1.0.1 grid_4.2.0 tidyselect_1.1.2
[22] data.table_1.14.2 glue_1.6.2 R6_2.5.1
[25] processx_3.7.0 fansi_1.0.3 distributional_0.3.0
[28] tensorA_0.36.2 ggplot2_3.3.6 farver_2.1.1
[31] purrr_0.3.4 posterior_1.2.2 magrittr_2.0.3
[34] ps_1.7.1 backports_1.4.1 scales_1.2.0
[37] abind_1.4-5 assertthat_0.2.1 colorspace_2.0-3
[40] renv_0.15.5 utf8_1.2.2 munsell_0.5.0
My WSL environment
- Ubuntu 20.04 LTS
- I have installed BLAS packages as shown below into the WSL, following @avehtari 's post:
sudo apt-get install liblapacke-dev
sudo apt-get install liblapacke
sudo apt-get install libopenblas-dev
sudo apt-get install libopenblas-serial-dev
sudo apt-get install libopenblas0
sudo apt-get install libopenblas0-serial
Reproducible code
- Open R from Windows (NOT from WSL; I do not have R on my WSL)
- Install CmdStan, both WSL version and native Windows version, with the following
cpp_options
respectively:- for WSL version,
cpp_options <- list( "CXXFLAGS += -march=native -mtune=native -DEIGEN_USE_BLAS -DEIGEN_USE_LAPACKE", "LDLIBS += -lblas -llapack -llapacke" )
- for native Windows version,
cpp_options <- list( "CXXFLAGS += -Wno-nonnull", "TBB_CXXFLAGS= -U__MSVCRT_VERSION__ -D__MSVCRT_VERSION__=0x0E00" )
- for WSL version,
- Run the code as follows (the code originally came from Getting started with CmdStanR):
library(cmdstanr)
file <- file.path(cmdstan_path(), "examples", "bernoulli", "bernoulli.stan")
data_list <- list(N = 10, y = c(0, 1, 0, 0, 0, 0, 0, 0, 0, 1))
## 2.30.1 (Non-WSL)
set_cmdstan_path("C:/Users/MY_USER_NAME/Documents/.cmdstan/cmdstan-2.30.1")
cmdstan_path()
cmdstan_version()
tictoc::tic()
mod_2.30.1_non_wsl <- cmdstan_model(
file,
force_recompile = TRUE ## since the same model is run multiple times
)
tictoc::toc() # 11.89 sec, 10.94 sec, 11.19 sec elapsed
tictoc::tic()
fit_2.30.1_non_wsl <- mod_2.30.1_non_wsl$sample(
data = data_list,
seed = 123,
chains = 4,
parallel_chains = 4,
refresh = 500 # print update every 500 iters
)
tictoc::toc() # 3.4 sec, 3.37 sec, 3.34 sec elapsed
## 2.30.1 (WSL)
set_cmdstan_path("C:/Users/MY_USER_NAME/Documents/.cmdstan/wsl-cmdstan-2.30.1")
cmdstan_path()
cmdstan_version()
tictoc::tic()
mod_2.30.1_wsl <- cmdstan_model(
file,
force_recompile = TRUE ## since the same model is run multiple times
)
tictoc::toc() # 54.27 sec, 51.04 sec, 57.39 sec elapsed
tictoc::tic()
fit_2.30.1_wsl <- mod_2.30.1_wsl$sample(
data = data_list,
seed = 123,
chains = 4,
parallel_chains = 4,
refresh = 500 # print update every 500 iters
)
tictoc::toc() # 4.94 sec, 4.64 sec, 4.67 sec elapsed
## 2.30.1 (WSL) with openblas setting
Sys.setenv(OPENBLAS_NUM_THREADS = "1")
Sys.getenv("OPENBLAS_NUM_THREADS")
set_cmdstan_path("C:/Users/MY_USER_NAME/Documents/.cmdstan/wsl-cmdstan-2.30.1")
cmdstan_path()
cmdstan_version()
tictoc::tic()
mod_2.30.1_wsl <- cmdstan_model(
file,
force_recompile = TRUE ## since the same model is run multiple times
)
tictoc::toc() # 54.19 sec, 60.53 sec, 51.86 sec elapsed
tictoc::tic()
fit_2.30.1_wsl <- mod_2.30.1_wsl$sample(
data = data_list,
seed = 123,
chains = 4,
parallel_chains = 4,
refresh = 500 # print update every 500 iters
)
tictoc::toc() # 4.27 sec, 4.83 sec, 4.71 sec elapsed