Many thanks for the quick and helpful replies. I’ve been trying a number of things today with mixed results.
@edm , at first I attempted to add that line to my .Renviron but wasn’t sure I was doing that correctly, so I then started pasting Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false") every time I restarted my session! I wasn’t sure what step exactly this applied to: installing the cmdstanr package, rebuilding CmdStan, or? Reading through the link and the mention of devtools / remotes got me thinking I should just follow the install.packages flavor for the cmdstanr package installation to sidestep the devtools issue.
@stevebronder no, now this is my first time setting compiler flags. I’ve now also read a bit from the manual, but am certainly out of my depth. I’ll dump below today’s experiments.
A quick summary:
- My fastest time (8.2 + 8 seconds) was from a fresh install of every Stan-related package while specifying
Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false")before those installations (only using theinstall.packagesversion from the github page) as well as before building cmdstan. But this time is still pretty slow. - I tried out giving only the compiler flags @stevebronder suggested (by setting
append = Fin thecmdstan_make_localcall) but also in addition to the defaults that came with my base installation:"CXXFLAGS += -Wno-nonnull -D_UCRT", "CXXFLAGS += -Wno-deprecated-declarations", "TBB_CXXFLAGS= -D_UCRT". No real change. - I learned quite a lot from this thread. One bit from that thread I tested out was dropping the
-mtune=nativeas well. Experiments tell me that this flag doesn’t seem sensitive. - The other bit I learned from Aki’s thread was that my previous assessment “BLAS likely has little to do with Stan performance” is most definitely wrong! So I may try that avenue again when I have the time. I was trying to get the AMD optimising CPU libraries (AOCL) to work, as that seems best for my chip. Just really struggled to get R to link to those libraries…
Anything else I could try in the meantime?
zacho's fumbling with compiler flags
library(cmdstanr)
# read closely:
?cmdstan_make_local()
# The cmdstan_make_local() function is used to read/write makefile flags and
# variables from/to the make/local file of a CmdStan installation. Writing to
# the make/local file can be used to permanently add makefile flags/variables to
# an installation. For example adding specific compiler switches, changing the
# C++ compiler, etc. A change to the make/local file should typically be
# followed by calling rebuild_cmdstan().
# we still good?
check_cmdstan_toolchain() # good
# for now, trying just what Steve Bronder suggested:
cpp_options <- list(
# "CXX" = "clang++",
"CXXFLAGS+= -O3 -march=native -mtune=native"
# PRECOMPILED_HEADERS = TRUE
)
cmdstan_make_local(cpp_options = cpp_options)
# [1] "CXXFLAGS += -Wno-nonnull -D_UCRT" "TBB_CXXFLAGS= -D_UCRT"
# [3] "CXXFLAGS += -Wno-deprecated-declarations" "CXXFLAGS+= -O3 -march=native -mtune=native"
# remember to finish with rebuild_cmdstan
rebuild_cmdstan()
# edited the profile to add the
# usethis::edit_r_profile(scope = 'user')
# cmdstanr::set_cmdstan_path('C:/Users/au786542/.cmdstan/cmdstan-2.37.0/stan/lib/stan_math/lib/tbb')
# restarted R session, but opens to:
# CmdStan path set to: C:/Users/au786542/.cmdstan/cmdstan-2.37.0/stan/lib/stan_math/lib/tbb
# Warning message:
# Can't find CmdStan makefile to detect version number. Path may not point to valid installation.
# hmm.... indeed, the test below
source('speed test - brms poisson fit.R')
# results in:
# Error: CmdStan path has not been set yet. See ?set_cmdstan_path.
# In addition: Warning message:
# Can't find CmdStan makefile to detect version number. Path may not point to valid installation.
# I went back, deleted the .rprofile (had nothing else in it), and let cmdstan detect it automatically
# as per the details in ?set_cmdstan_path
# restarted R, then:
source('speed test - brms poisson fit.R')
# which opens with:
# This is cmdstanr version 0.9.0.9000
# - CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
# - CmdStan path: C:/Users/au786542/.cmdstan/cmdstan-2.37.0
# - CmdStan version: 2.37.0
# Compiling Stan program...
# that path *should* be the correct one that we just rebuilt
rstan::get_elapsed_time(fit1$fit)
# warmup sample
# chain:1 8.557 8.418
# chain:2 8.655 7.906
# chain:3 8.593 8.456
# chain:4 8.263 8.022
# chain:5 8.964 8.425
# chain:6 8.809 8.208
# ran this a couple times, still the same speed
# I don't know if there are more compiler settings I should tweak...
# this link has a ton but it's all Greek to me https://gcc.gnu.org/onlinedocs/gcc/Option-Summary.html
Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false")
# rebuilt again, same compiler flag tweaks as before; restarted
source('speed test - brms poisson fit.R')
rstan::get_elapsed_time(fit1$fit)
# still the same
## read through this thread: https://discourse.mc-stan.org/t/speedup-by-using-external-blas-lapack-with-cmdstan-and-cmdstanr-py/25441/29
# Aki mentions something about specifics for Windows flags, specifically dropping -mtune=native
# let's try!
# restarting...
# actually, let's get a 'clean' cpp_options first (just read the append argument)
library(cmdstanr)
cmdstan_make_local()
cpp_options <- list(
"CXXFLAGS+= -O3 -march=native -mtune=native"
)
cmdstan_make_local(cpp_options = cpp_options, append = F)
cmdstan_make_local() # dropped the older ones
rebuild_cmdstan(cores = 4)
source('speed test - brms poisson fit.R') # nope
## try the above again, but also include that sys.setenv call as well
library(cmdstanr)
cmdstan_make_local() # same as above
Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false")
rebuild_cmdstan(cores = 4)
source('speed test - brms poisson fit.R') # it got slower! 10ish seconds for both
# reduced to 4 cores *just to be sure*, now it's back in the 9 + 8.5 ballpark
## one more iteration: use all the flags that were originally there by default, plus -march=native, but NOT -mtune
# see how that goes first, then can try adding in mtune
library(cmdstanr)
cmdstan_make_local() # [1] "CXXFLAGS+= -O3 -march=native -mtune=native"
cpp_options <- list(
"CXXFLAGS+= -O3 -march=native", # no mtune for now
# these were there for my default installation:
"CXXFLAGS += -Wno-nonnull -D_UCRT",
"CXXFLAGS += -Wno-deprecated-declarations",
"TBB_CXXFLAGS= -D_UCRT"
)
cmdstan_make_local(cpp_options = cpp_options, append = F)
cmdstan_make_local() # ok
## I've reinstalled rstan, StanHeaders, and cmdstanr all wtih Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false")
# and not using the remotes::install_github version but rather the install.packages version
Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false")
rebuild_cmdstan(cores = 4)
source('speed test - brms poisson fit.R') # 8.2 and 8!! fastest yet lmao
# 8 and 7.8!
## now that we have some progress, let's try the custom cpp options again
library(cmdstanr)
cmdstan_make_local()
# [1] "CXXFLAGS+= -O3 -march=native" "CXXFLAGS += -Wno-nonnull -D_UCRT"
# [3] "CXXFLAGS += -Wno-deprecated-declarations" "TBB_CXXFLAGS= -D_UCRT"
cpp_options <- list(
"CXXFLAGS+= -O3 -march=native -mtune=native", # whole shebang
# these were there for my default installation:
"CXXFLAGS += -Wno-nonnull -D_UCRT",
"CXXFLAGS += -Wno-deprecated-declarations",
"TBB_CXXFLAGS= -D_UCRT"
)
cmdstan_make_local(cpp_options = cpp_options, append = F)
cmdstan_make_local() # ok
Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false")
rebuild_cmdstan(cores = 4)
source('speed test - brms poisson fit.R') # 8.8 and 9.4, 9 and 8.5, sometimes slower
## so maybe go back to default with out mtune?
library(cmdstanr)
cmdstan_make_local() # both march and mtune here
# [1] "CXXFLAGS+= -O3 -march=native -mtune=native" "CXXFLAGS += -Wno-nonnull -D_UCRT"
# [3] "CXXFLAGS += -Wno-deprecated-declarations" "TBB_CXXFLAGS= -D_UCRT"
cpp_options <- list(
"CXXFLAGS+= -O3 -march=native", # drop mtune
# these were there for my default installation:
"CXXFLAGS += -Wno-nonnull -D_UCRT",
"CXXFLAGS += -Wno-deprecated-declarations",
"TBB_CXXFLAGS= -D_UCRT"
)
cmdstan_make_local(cpp_options = cpp_options, append = F)
cmdstan_make_local() # ok
Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false")
rebuild_cmdstan(cores = 4)
source('speed test - brms poisson fit.R') # 9 and 9.8, 9.4 and 9.3