I purchased a new computer with a powerful multicore CPU to run my Stan models faster. However I am having some difficulty configuring my compiler (at least I think that’s the culprit.) Any advice on speeding things up would be greatly appreciated.
When I run an example model with rstanarm with my new computer (Windows 10), and compare to my old setup (OSX), the new machine samples about twice as fast. Great! I would have expected about this much of a speedup from a significantly faster computer.
However, compiled programs (rstanarm models are precompiled I think?) are actually running about 20%ish slower on my new computer, compared to my older slow machine. I am new to Windows and am wondering whether it has something to do with the compiler. (I have tried installing Ubuntu but alas have not succeeded yet.)
I tried illustrating this with an example below. Interestingly, there is also a large performance difference between rstanarm and brms but I am not sure if that is related to my question.
First, I set everything up following the Rstan getting started guide:
dotR <- file.path(Sys.getenv("HOME"), ".R")
if (!file.exists(dotR)) dir.create(dotR)
M <- file.path(dotR, ifelse(.Platform$OS.type == "windows", "Makevars.win", "Makevars"))
if (!file.exists(M)) file.create(M)
# cat("\nCXX14FLAGS=-O3 -march=native -mtune=native",
# if( grepl("^darwin", R.version$os)) "CXX14FLAGS += -arch x86_64 -ftemplate-depth-256" else
# if (.Platform$OS.type == "windows") "CXX11FLAGS=-O3 -march=corei7 -mtune=corei7" else
# "CXX14FLAGS += -fPIC",
# file = M, sep = "\n", append = TRUE)
# file.edit(M)
readLines(M)
## [1] ""
## [2] "CXX14FLAGS=-O3 -march=native -mtune=native"
## [3] "CXX11FLAGS=-O3 -march=corei7 -mtune=corei7"
system.time(
stan_lmer(
Reaction ~ Days + (Days | Subject),
data = sleepstudy,
iter = 10000,
chains = 1
)
)
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1: Elapsed Time: 13.658 seconds (Warm-up)
## Chain 1: 9.733 seconds (Sampling)
## Chain 1: 23.391 seconds (Total)
## user system elapsed
## 24.31 0.01 24.32
I ran the same code with my old OSX computer, which resulted in ~50s runtime. I am very happy for this speedup. However, with compiled models things are different, below. I first compile the model (so I can time just the sampling, afterwards):
brms_sampler <- brm(
Reaction ~ Days + (Days | Subject),
data = sleepstudy,
chains = 0
)
And then sample
system.time(
update(brms_sampler, iter = 10000, chains = 1)
)
## SAMPLING FOR MODEL '49d09615cc885efe2c01f11482fd096d' NOW (CHAIN 1).
## Chain 1: Elapsed Time: 8.693 seconds (Warm-up)
## Chain 1: 7.492 seconds (Sampling)
## Chain 1: 16.185 seconds (Total)
## user system elapsed
## 16.61 0.00 16.61
(I don’t know why brms would be this much faster than rstanarm!) I ran the same code with my old slower OSX machine, and it ran in ~12 seconds. So I wonder what is hurting the performance of my newer machine such that compiled models run slower than equivalent ones on a slower machine? I also wonder whether this has something to do with the CmdStan installation error I posted earlier (Error installing CmdStan (Windows 10; Command 'mingw32-make.exe' not found @win/processx.c:983)).
Session and computer details below:
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18362)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252
## [2] LC_CTYPE=English_United Kingdom.1252
## [3] LC_MONETARY=English_United Kingdom.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United Kingdom.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] brms_2.13.3 rstanarm_2.19.3 Rcpp_1.0.4.6 lme4_1.1-23
## [5] Matrix_1.2-18
benchmarkme::get_cpu()
$vendor_id
[1] "AuthenticAMD"
$model_name
[1] "AMD Ryzen 9 3900X 12-Core Processor"
$no_of_cores
[1] 24
Thanks very much for your time