Abysmal performance on a new AMD laptop with windows 11

Good day all,

I need help with speeding up my hardware, which is inexcusably slow. I suspect it’s my OS or some aspect of the setup.

I have a new laptop through my university running (unfortunately) Windows 11. The CPU is reasonably new (late 2023) and ought to be quite performative: AMD Ryzen 7 Pro 7735U.

However, I have only felt this pervading sense of sluggishness. Models in brms that, in my experience, should fit in a few seconds now take minutes or longer. Anything halfway complicated seems out of the question.

So I tried a simple benchmark for comparing with my older machines using the Poisson model in ?brms::brm bumped up to 4000 iterations:

library(brms); library(cmdstanr)
set.seed(1234)
fit1 <- brm(
  count ~ zBase * Trt + (1|patient),
  data = epilepsy, family = poisson(),
  prior = prior(normal(0, 10), class = b) +
    prior(cauchy(0, 2), class = sd),
  backend = 'cmdstanr', cores = 6, chains = 6, iter = 4000
)
rstan::get_elapsed_time(fit1$fit)

(Note: parallel::detectCores() returns 16 but the CPU has 8 physical cores, so I stay at or below 8 cores generally.)
On this machine, I often get 9 to 10 seconds warmup and 8ish seconds on sampling (averaging chains). On a 10 year old laptop with some mid-range Intel CPU for the time (but running Ubuntu), I get around 5 and 4.2 seconds… so half the time. On a 2019 laptop running windows but with an Intel CPU (i7-9750H), I get 5.3 and 6 seconds. Neither of these older CPUs should in theory outperform my current CPU, so I figured it’s something else. A colleague with a recent Intel CPU and Windows 11 (and presumably all the same university IT settings) gets 5.3 and 4.8 seconds.

I think I’ve disabled Windows 11 ‘efficiency’ power settings and told the OEM software to use high-performance everywhere. I’ve tried fresh installs of R, Rtools, rstan, cmdstan, and so on. I wasted a day attempting to switch out my BLAS without success (which I only later realised BLAS likely has little to do with Stan performance). I learned a bit in that process about differences between Intel and AMD CPUs and their optimisation for scientific computing. But this is now beyond my level.

Is there something I should look for during my Stan installation? Some setting I may have missed? Something particular to AMD chips?

Any pointers are much appreciated. This is driving me insane! I’m not a hardware guy or speed demon, but would just like my brms fits to get a move on :^)

sessionInfo() call for the machine in question
> sessionInfo()
R version 4.5.1 (2025-06-13 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default
  LAPACK version 3.12.1

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                           LC_TIME=English_United States.utf8    

time zone: Europe/Copenhagen
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] cmdstanr_0.9.0.9000 brms_2.23.0         Rcpp_1.1.0         

loaded via a namespace (and not attached):
 [1] Matrix_1.7-3          bayesplot_1.14.0      jsonlite_2.0.0        gtable_0.3.6          dplyr_1.1.4          
 [6] compiler_4.5.1        tidyselect_1.2.1      stringr_1.5.2         parallel_4.5.1        scales_1.4.0         
[11] lattice_0.22-7        coda_0.19-4.1         ggplot2_4.0.0         R6_2.6.1              Brobdingnag_1.2-9    
[16] generics_0.1.4        distributional_0.5.0  knitr_1.50            backports_1.5.0       checkmate_2.3.3      
[21] tibble_3.3.0          pillar_1.11.1         RColorBrewer_1.1-3    posterior_1.6.1       rlang_1.1.6          
[26] stringi_1.8.7         xfun_0.53             S7_0.2.0              RcppParallel_5.1.11-1 estimability_1.5.1   
[31] cli_3.6.5             magrittr_2.0.4        ps_1.9.1              emmeans_2.0.0         rstantools_2.5.0     
[36] processx_3.8.6        grid_4.5.1            xtable_1.8-4          rstudioapi_0.17.1     mvtnorm_1.3-3        
[41] lifecycle_1.0.4       nlme_3.1-168          vctrs_0.6.5           evaluate_1.0.5        tensorA_0.36.2.1     
[46] glue_1.8.0            farver_2.1.2          bridgesampling_1.1-2  abind_1.4-8           matrixStats_1.5.0    
[51] tools_4.5.1           loo_2.8.0             pkgconfig_2.0.3  

Hello! Are you setting any compiler flags? When building cmdstan and stan models it can be helpful to set -O3 -march=native -mtune=native

3 Likes

It sounds similar to other issues where a debug build was being created and making everything take longer: How to use devtools::load_all() in developing rstantools packages

The bottom of that thread has a Sys.setenv() command that you might try.

2 Likes

Many thanks for the quick and helpful replies. I’ve been trying a number of things today with mixed results.

@edm , at first I attempted to add that line to my .Renviron but wasn’t sure I was doing that correctly, so I then started pasting Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false") every time I restarted my session! I wasn’t sure what step exactly this applied to: installing the cmdstanr package, rebuilding CmdStan, or? Reading through the link and the mention of devtools / remotes got me thinking I should just follow the install.packages flavor for the cmdstanr package installation to sidestep the devtools issue.

@stevebronder no, now this is my first time setting compiler flags. I’ve now also read a bit from the manual, but am certainly out of my depth. I’ll dump below today’s experiments.

A quick summary:

  • My fastest time (8.2 + 8 seconds) was from a fresh install of every Stan-related package while specifying Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false") before those installations (only using the install.packages version from the github page) as well as before building cmdstan. But this time is still pretty slow.
  • I tried out giving only the compiler flags @stevebronder suggested (by setting append = F in the cmdstan_make_local call) but also in addition to the defaults that came with my base installation: "CXXFLAGS += -Wno-nonnull -D_UCRT", "CXXFLAGS += -Wno-deprecated-declarations", "TBB_CXXFLAGS= -D_UCRT". No real change.
  • I learned quite a lot from this thread. One bit from that thread I tested out was dropping the -mtune=native as well. Experiments tell me that this flag doesn’t seem sensitive.
  • The other bit I learned from Aki’s thread was that my previous assessment “BLAS likely has little to do with Stan performance” is most definitely wrong! So I may try that avenue again when I have the time. I was trying to get the AMD optimising CPU libraries (AOCL) to work, as that seems best for my chip. Just really struggled to get R to link to those libraries…

Anything else I could try in the meantime?

zacho's fumbling with compiler flags
library(cmdstanr)
# read closely:
?cmdstan_make_local()
  # The cmdstan_make_local() function is used to read/write makefile flags and
  # variables from/to the make/local file of a CmdStan installation. Writing to
  # the make/local file can be used to permanently add makefile flags/variables to
  # an installation. For example adding specific compiler switches, changing the
  # C++ compiler, etc. A change to the make/local file should typically be
  # followed by calling rebuild_cmdstan().
# we still good?
check_cmdstan_toolchain() # good

# for now, trying just what Steve Bronder suggested:
cpp_options <- list(
  # "CXX" = "clang++",
  "CXXFLAGS+= -O3 -march=native -mtune=native"
  # PRECOMPILED_HEADERS = TRUE
)
cmdstan_make_local(cpp_options = cpp_options)
# [1] "CXXFLAGS += -Wno-nonnull -D_UCRT"           "TBB_CXXFLAGS= -D_UCRT"                     
# [3] "CXXFLAGS += -Wno-deprecated-declarations"   "CXXFLAGS+= -O3 -march=native -mtune=native"

# remember to finish with rebuild_cmdstan
rebuild_cmdstan()

# edited the profile to add the 
# usethis::edit_r_profile(scope = 'user')
# cmdstanr::set_cmdstan_path('C:/Users/au786542/.cmdstan/cmdstan-2.37.0/stan/lib/stan_math/lib/tbb')

# restarted R session, but opens to:
# CmdStan path set to: C:/Users/au786542/.cmdstan/cmdstan-2.37.0/stan/lib/stan_math/lib/tbb
# Warning message:
#   Can't find CmdStan makefile to detect version number. Path may not point to valid installation. 

  # hmm.... indeed, the test below
source('speed test - brms poisson fit.R')
  # results in:
# Error: CmdStan path has not been set yet. See ?set_cmdstan_path.
# In addition: Warning message:
#   Can't find CmdStan makefile to detect version number. Path may not point to valid installation. 

# I went back, deleted the .rprofile (had nothing else in it), and let cmdstan detect it automatically
# as per the details in ?set_cmdstan_path
# restarted R, then:
source('speed test - brms poisson fit.R')
  # which opens with:
  # This is cmdstanr version 0.9.0.9000
  # - CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
  # - CmdStan path: C:/Users/au786542/.cmdstan/cmdstan-2.37.0
  # - CmdStan version: 2.37.0
  # Compiling Stan program...

# that path *should* be the correct one that we just rebuilt
rstan::get_elapsed_time(fit1$fit)
  #         warmup sample
  # chain:1  8.557  8.418
  # chain:2  8.655  7.906
  # chain:3  8.593  8.456
  # chain:4  8.263  8.022
  # chain:5  8.964  8.425
  # chain:6  8.809  8.208

# ran this a couple times, still the same speed

# I don't know if there are more compiler settings I should tweak...
  # this link has a ton but it's all Greek to me https://gcc.gnu.org/onlinedocs/gcc/Option-Summary.html

Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false")
# rebuilt again, same compiler flag tweaks as before; restarted
source('speed test - brms poisson fit.R')
rstan::get_elapsed_time(fit1$fit)
# still the same

## read through this thread: https://discourse.mc-stan.org/t/speedup-by-using-external-blas-lapack-with-cmdstan-and-cmdstanr-py/25441/29
  # Aki mentions something about specifics for Windows flags, specifically dropping -mtune=native
  # let's try!
    # restarting...
  # actually, let's get a 'clean' cpp_options first (just read the append argument)
library(cmdstanr)
cmdstan_make_local()
cpp_options <- list(
  "CXXFLAGS+= -O3 -march=native -mtune=native" 
)
cmdstan_make_local(cpp_options = cpp_options, append = F)
cmdstan_make_local() # dropped the older ones
rebuild_cmdstan(cores = 4)
source('speed test - brms poisson fit.R') # nope

## try the above again, but also include that sys.setenv call as well
library(cmdstanr)
cmdstan_make_local() # same as above
Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false")
rebuild_cmdstan(cores = 4)
source('speed test - brms poisson fit.R') # it got slower! 10ish seconds for both
# reduced to 4 cores *just to be sure*, now it's back in the 9 + 8.5 ballpark

## one more iteration: use all the flags that were originally there by default, plus -march=native, but NOT -mtune
  # see how that goes first, then can try adding in mtune
library(cmdstanr)
cmdstan_make_local() # [1] "CXXFLAGS+= -O3 -march=native -mtune=native"
cpp_options <- list(
  "CXXFLAGS+= -O3 -march=native", # no mtune for now
  # these were there for my default installation:
  "CXXFLAGS += -Wno-nonnull -D_UCRT",
  "CXXFLAGS += -Wno-deprecated-declarations",
  "TBB_CXXFLAGS= -D_UCRT"
)
cmdstan_make_local(cpp_options = cpp_options, append = F)
cmdstan_make_local() # ok


## I've reinstalled rstan, StanHeaders, and cmdstanr all wtih Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false")
  # and not using the remotes::install_github version but rather the install.packages version
Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false")
rebuild_cmdstan(cores = 4)
source('speed test - brms poisson fit.R') # 8.2 and 8!! fastest yet lmao
  # 8 and 7.8!

## now that we have some progress, let's try the custom cpp options again
library(cmdstanr)
cmdstan_make_local() 
# [1] "CXXFLAGS+= -O3 -march=native"             "CXXFLAGS += -Wno-nonnull -D_UCRT"        
# [3] "CXXFLAGS += -Wno-deprecated-declarations" "TBB_CXXFLAGS= -D_UCRT"            
cpp_options <- list(
  "CXXFLAGS+= -O3 -march=native -mtune=native", # whole shebang
  # these were there for my default installation:
  "CXXFLAGS += -Wno-nonnull -D_UCRT",
  "CXXFLAGS += -Wno-deprecated-declarations",
  "TBB_CXXFLAGS= -D_UCRT"
)
cmdstan_make_local(cpp_options = cpp_options, append = F)
cmdstan_make_local() # ok
Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false")
rebuild_cmdstan(cores = 4) 
source('speed test - brms poisson fit.R') # 8.8 and 9.4, 9 and 8.5, sometimes slower


## so maybe go back to default with out mtune?
library(cmdstanr)
cmdstan_make_local()  # both march and mtune here
# [1] "CXXFLAGS+= -O3 -march=native -mtune=native" "CXXFLAGS += -Wno-nonnull -D_UCRT"          
# [3] "CXXFLAGS += -Wno-deprecated-declarations"   "TBB_CXXFLAGS= -D_UCRT"
cpp_options <- list(
  "CXXFLAGS+= -O3 -march=native", # drop mtune
  # these were there for my default installation:
  "CXXFLAGS += -Wno-nonnull -D_UCRT",
  "CXXFLAGS += -Wno-deprecated-declarations",
  "TBB_CXXFLAGS= -D_UCRT"
)
cmdstan_make_local(cpp_options = cpp_options, append = F)
cmdstan_make_local() # ok
Sys.setenv(PKG_BUILD_EXTRA_FLAGS = "false")
rebuild_cmdstan(cores = 4) 
source('speed test - brms poisson fit.R') # 9 and 9.8, 9.4 and 9.3

I don’t have many more ideas, but here is one other thing that has caused me problems in the past: could there be previously-installed versions of these packages in a different directory, and R is loading the previous installations instead of these new installations with different flags?

You could look at .libPaths() to ensure the first directory is the one where you are installing the packages. And, to load packages in one of the other directories, do something like library(cmdstanr, lib.loc = .libPaths()[2])

1 Like

With all the (understandable) frustration with Windows, I assume you’re not using WSL. It’s not solving the problem directly, and there may still be some performance difference due to the overhead of having it run within Windows, but since it’s a real Linux kernel, you shouldn’t have any problems of the OS interacting with Stan in unexpected and difficult to solve ways.

1 Like

Many thanks for the good ideas. I’ve also had issues with .libPathsin the past for other R matters and had forgotten about it. As far as I can tell, it shouldn’t be a path issue. My .libPaths() returns two locations with 1st being my user folder and 2nd being C:/Program Files/R/etc. I believe cmdstanr has always installed to 1, and trying library(cmdstanr, lib.loc = .libPaths()[2]) correctly returned error (“no package called…”).

I’m not 100% certain yet that this isn’t an issue, as I did most recently install R with admin privileges during my attempts to get alternative BLAS to work and it’s quite possible that I get these paths mixed up. So I’m still checking…

In the meantime, I’m going to use another machine for heavy lifts and look into WSL for this machine.

Correct, have never tried WSL. I’m reading into it and think I’ll give it a go when I next have a good free afternoon! Thanks for the suggestion.
I also just noticed the wsl argument in install_cmdstan.