Abysmal performance on a new AMD laptop with windows 11

Many thanks for the quick and helpful replies. I’ve been trying a number of things today with mixed results.

@edm , at first I attempted to add that line to my .Renviron but wasn’t sure I was doing that correctly, so I then started pasting Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false") every time I restarted my session! I wasn’t sure what step exactly this applied to: installing the cmdstanr package, rebuilding CmdStan, or? Reading through the link and the mention of devtools / remotes got me thinking I should just follow the install.packages flavor for the cmdstanr package installation to sidestep the devtools issue.

@stevebronder no, now this is my first time setting compiler flags. I’ve now also read a bit from the manual, but am certainly out of my depth. I’ll dump below today’s experiments.

A quick summary:

  • My fastest time (8.2 + 8 seconds) was from a fresh install of every Stan-related package while specifying Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false") before those installations (only using the install.packages version from the github page) as well as before building cmdstan. But this time is still pretty slow.
  • I tried out giving only the compiler flags @stevebronder suggested (by setting append = F in the cmdstan_make_local call) but also in addition to the defaults that came with my base installation: "CXXFLAGS += -Wno-nonnull -D_UCRT", "CXXFLAGS += -Wno-deprecated-declarations", "TBB_CXXFLAGS= -D_UCRT". No real change.
  • I learned quite a lot from this thread. One bit from that thread I tested out was dropping the -mtune=native as well. Experiments tell me that this flag doesn’t seem sensitive.
  • The other bit I learned from Aki’s thread was that my previous assessment “BLAS likely has little to do with Stan performance” is most definitely wrong! So I may try that avenue again when I have the time. I was trying to get the AMD optimising CPU libraries (AOCL) to work, as that seems best for my chip. Just really struggled to get R to link to those libraries…

Anything else I could try in the meantime?

zacho's fumbling with compiler flags
library(cmdstanr)
# read closely:
?cmdstan_make_local()
  # The cmdstan_make_local() function is used to read/write makefile flags and
  # variables from/to the make/local file of a CmdStan installation. Writing to
  # the make/local file can be used to permanently add makefile flags/variables to
  # an installation. For example adding specific compiler switches, changing the
  # C++ compiler, etc. A change to the make/local file should typically be
  # followed by calling rebuild_cmdstan().
# we still good?
check_cmdstan_toolchain() # good

# for now, trying just what Steve Bronder suggested:
cpp_options <- list(
  # "CXX" = "clang++",
  "CXXFLAGS+= -O3 -march=native -mtune=native"
  # PRECOMPILED_HEADERS = TRUE
)
cmdstan_make_local(cpp_options = cpp_options)
# [1] "CXXFLAGS += -Wno-nonnull -D_UCRT"           "TBB_CXXFLAGS= -D_UCRT"                     
# [3] "CXXFLAGS += -Wno-deprecated-declarations"   "CXXFLAGS+= -O3 -march=native -mtune=native"

# remember to finish with rebuild_cmdstan
rebuild_cmdstan()

# edited the profile to add the 
# usethis::edit_r_profile(scope = 'user')
# cmdstanr::set_cmdstan_path('C:/Users/au786542/.cmdstan/cmdstan-2.37.0/stan/lib/stan_math/lib/tbb')

# restarted R session, but opens to:
# CmdStan path set to: C:/Users/au786542/.cmdstan/cmdstan-2.37.0/stan/lib/stan_math/lib/tbb
# Warning message:
#   Can't find CmdStan makefile to detect version number. Path may not point to valid installation. 

  # hmm.... indeed, the test below
source('speed test - brms poisson fit.R')
  # results in:
# Error: CmdStan path has not been set yet. See ?set_cmdstan_path.
# In addition: Warning message:
#   Can't find CmdStan makefile to detect version number. Path may not point to valid installation. 

# I went back, deleted the .rprofile (had nothing else in it), and let cmdstan detect it automatically
# as per the details in ?set_cmdstan_path
# restarted R, then:
source('speed test - brms poisson fit.R')
  # which opens with:
  # This is cmdstanr version 0.9.0.9000
  # - CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
  # - CmdStan path: C:/Users/au786542/.cmdstan/cmdstan-2.37.0
  # - CmdStan version: 2.37.0
  # Compiling Stan program...

# that path *should* be the correct one that we just rebuilt
rstan::get_elapsed_time(fit1$fit)
  #         warmup sample
  # chain:1  8.557  8.418
  # chain:2  8.655  7.906
  # chain:3  8.593  8.456
  # chain:4  8.263  8.022
  # chain:5  8.964  8.425
  # chain:6  8.809  8.208

# ran this a couple times, still the same speed

# I don't know if there are more compiler settings I should tweak...
  # this link has a ton but it's all Greek to me https://gcc.gnu.org/onlinedocs/gcc/Option-Summary.html

Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false")
# rebuilt again, same compiler flag tweaks as before; restarted
source('speed test - brms poisson fit.R')
rstan::get_elapsed_time(fit1$fit)
# still the same

## read through this thread: https://discourse.mc-stan.org/t/speedup-by-using-external-blas-lapack-with-cmdstan-and-cmdstanr-py/25441/29
  # Aki mentions something about specifics for Windows flags, specifically dropping -mtune=native
  # let's try!
    # restarting...
  # actually, let's get a 'clean' cpp_options first (just read the append argument)
library(cmdstanr)
cmdstan_make_local()
cpp_options <- list(
  "CXXFLAGS+= -O3 -march=native -mtune=native" 
)
cmdstan_make_local(cpp_options = cpp_options, append = F)
cmdstan_make_local() # dropped the older ones
rebuild_cmdstan(cores = 4)
source('speed test - brms poisson fit.R') # nope

## try the above again, but also include that sys.setenv call as well
library(cmdstanr)
cmdstan_make_local() # same as above
Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false")
rebuild_cmdstan(cores = 4)
source('speed test - brms poisson fit.R') # it got slower! 10ish seconds for both
# reduced to 4 cores *just to be sure*, now it's back in the 9 + 8.5 ballpark

## one more iteration: use all the flags that were originally there by default, plus -march=native, but NOT -mtune
  # see how that goes first, then can try adding in mtune
library(cmdstanr)
cmdstan_make_local() # [1] "CXXFLAGS+= -O3 -march=native -mtune=native"
cpp_options <- list(
  "CXXFLAGS+= -O3 -march=native", # no mtune for now
  # these were there for my default installation:
  "CXXFLAGS += -Wno-nonnull -D_UCRT",
  "CXXFLAGS += -Wno-deprecated-declarations",
  "TBB_CXXFLAGS= -D_UCRT"
)
cmdstan_make_local(cpp_options = cpp_options, append = F)
cmdstan_make_local() # ok


## I've reinstalled rstan, StanHeaders, and cmdstanr all wtih Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false")
  # and not using the remotes::install_github version but rather the install.packages version
Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false")
rebuild_cmdstan(cores = 4)
source('speed test - brms poisson fit.R') # 8.2 and 8!! fastest yet lmao
  # 8 and 7.8!

## now that we have some progress, let's try the custom cpp options again
library(cmdstanr)
cmdstan_make_local() 
# [1] "CXXFLAGS+= -O3 -march=native"             "CXXFLAGS += -Wno-nonnull -D_UCRT"        
# [3] "CXXFLAGS += -Wno-deprecated-declarations" "TBB_CXXFLAGS= -D_UCRT"            
cpp_options <- list(
  "CXXFLAGS+= -O3 -march=native -mtune=native", # whole shebang
  # these were there for my default installation:
  "CXXFLAGS += -Wno-nonnull -D_UCRT",
  "CXXFLAGS += -Wno-deprecated-declarations",
  "TBB_CXXFLAGS= -D_UCRT"
)
cmdstan_make_local(cpp_options = cpp_options, append = F)
cmdstan_make_local() # ok
Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false")
rebuild_cmdstan(cores = 4) 
source('speed test - brms poisson fit.R') # 8.8 and 9.4, 9 and 8.5, sometimes slower


## so maybe go back to default with out mtune?
library(cmdstanr)
cmdstan_make_local()  # both march and mtune here
# [1] "CXXFLAGS+= -O3 -march=native -mtune=native" "CXXFLAGS += -Wno-nonnull -D_UCRT"          
# [3] "CXXFLAGS += -Wno-deprecated-declarations"   "TBB_CXXFLAGS= -D_UCRT"
cpp_options <- list(
  "CXXFLAGS+= -O3 -march=native", # drop mtune
  # these were there for my default installation:
  "CXXFLAGS += -Wno-nonnull -D_UCRT",
  "CXXFLAGS += -Wno-deprecated-declarations",
  "TBB_CXXFLAGS= -D_UCRT"
)
cmdstan_make_local(cpp_options = cpp_options, append = F)
cmdstan_make_local() # ok
Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false")
rebuild_cmdstan(cores = 4) 
source('speed test - brms poisson fit.R') # 9 and 9.8, 9.4 and 9.3