Good day all,
I need help with speeding up my hardware, which is inexcusably slow. I suspect it’s my OS or some aspect of the setup.
I have a new laptop through my university running (unfortunately) Windows 11. The CPU is reasonably new (late 2023) and ought to be quite performative: AMD Ryzen 7 Pro 7735U.
However, I have only felt this pervading sense of sluggishness. Models in brms that, in my experience, should fit in a few seconds now take minutes or longer. Anything halfway complicated seems out of the question.
So I tried a simple benchmark for comparing with my older machines using the Poisson model in ?brms::brm bumped up to 4000 iterations:
library(brms); library(cmdstanr)
set.seed(1234)
fit1 <- brm(
count ~ zBase * Trt + (1|patient),
data = epilepsy, family = poisson(),
prior = prior(normal(0, 10), class = b) +
prior(cauchy(0, 2), class = sd),
backend = 'cmdstanr', cores = 6, chains = 6, iter = 4000
)
rstan::get_elapsed_time(fit1$fit)
(Note: parallel::detectCores() returns 16 but the CPU has 8 physical cores, so I stay at or below 8 cores generally.)
On this machine, I often get 9 to 10 seconds warmup and 8ish seconds on sampling (averaging chains). On a 10 year old laptop with some mid-range Intel CPU for the time (but running Ubuntu), I get around 5 and 4.2 seconds… so half the time. On a 2019 laptop running windows but with an Intel CPU (i7-9750H), I get 5.3 and 6 seconds. Neither of these older CPUs should in theory outperform my current CPU, so I figured it’s something else. A colleague with a recent Intel CPU and Windows 11 (and presumably all the same university IT settings) gets 5.3 and 4.8 seconds.
I think I’ve disabled Windows 11 ‘efficiency’ power settings and told the OEM software to use high-performance everywhere. I’ve tried fresh installs of R, Rtools, rstan, cmdstan, and so on. I wasted a day attempting to switch out my BLAS without success (which I only later realised BLAS likely has little to do with Stan performance). I learned a bit in that process about differences between Intel and AMD CPUs and their optimisation for scientific computing. But this is now beyond my level.
Is there something I should look for during my Stan installation? Some setting I may have missed? Something particular to AMD chips?
Any pointers are much appreciated. This is driving me insane! I’m not a hardware guy or speed demon, but would just like my brms fits to get a move on :^)
sessionInfo() call for the machine in question
> sessionInfo()
R version 4.5.1 (2025-06-13 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)
Matrix products: default
LAPACK version 3.12.1
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C LC_TIME=English_United States.utf8
time zone: Europe/Copenhagen
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] cmdstanr_0.9.0.9000 brms_2.23.0 Rcpp_1.1.0
loaded via a namespace (and not attached):
[1] Matrix_1.7-3 bayesplot_1.14.0 jsonlite_2.0.0 gtable_0.3.6 dplyr_1.1.4
[6] compiler_4.5.1 tidyselect_1.2.1 stringr_1.5.2 parallel_4.5.1 scales_1.4.0
[11] lattice_0.22-7 coda_0.19-4.1 ggplot2_4.0.0 R6_2.6.1 Brobdingnag_1.2-9
[16] generics_0.1.4 distributional_0.5.0 knitr_1.50 backports_1.5.0 checkmate_2.3.3
[21] tibble_3.3.0 pillar_1.11.1 RColorBrewer_1.1-3 posterior_1.6.1 rlang_1.1.6
[26] stringi_1.8.7 xfun_0.53 S7_0.2.0 RcppParallel_5.1.11-1 estimability_1.5.1
[31] cli_3.6.5 magrittr_2.0.4 ps_1.9.1 emmeans_2.0.0 rstantools_2.5.0
[36] processx_3.8.6 grid_4.5.1 xtable_1.8-4 rstudioapi_0.17.1 mvtnorm_1.3-3
[41] lifecycle_1.0.4 nlme_3.1-168 vctrs_0.6.5 evaluate_1.0.5 tensorA_0.36.2.1
[46] glue_1.8.0 farver_2.1.2 bridgesampling_1.1-2 abind_1.4-8 matrixStats_1.5.0
[51] tools_4.5.1 loo_2.8.0 pkgconfig_2.0.3