Brms seems to omit option "cores" while fitting model

Hello, I am relatively new to Bayesian modeling, and recently encountered the following problem:

I wanted to see if I could speed up my computations and therefore I set the number of cores to 8 in brm function. However, it seems to have no effect: my CPU usage is around 24% (see the picture)

Plus this is how sampling goes:

Chain 1: Gradient evaluation took 0.013708 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 137.08 seconds.

Moreover, when I run the same script on our lab server (which in theory has 10x more cores than my laptop), with the cores set to 80 the model takes more a less the same amount of time to complete as on my laptop.
It does not matter if I run my script from R Studio or R directly.

My question is, should it be the case? Is there a way to speed up the computations (for example by using also graphic card)? What should I add to my code?

Or maybe with my relatively small data I will not see the difference and this is the best what I can get?

library (tidyverse)

fake_data <- data.frame(
  id = rep(rep(c(1:40), times=520))
) %>% 
  group_by(id) %>% 
         task = case_when(
           trial %in% c(1:260) & id %in% c(1:20) ~ "low", 
           trial %in% c(261:520) & id %in% c(1:20) ~ "high",
           trial %in% c(261:520) & id %in% c(21:40) ~ "low", 
           trial %in% c(1:260) & id %in% c(21:40) ~ "high",
         rating = rep(sample(c(1:4)), times=130)) %>% 
  group_by(id, task) %>% 
  mutate(pre_rating = lag(rating)) %>%
  filter(! %>% 
  mutate(pre_rating=factor(pre_rating, labels=
                             c("a", "b", "c", "d")))

# Bayesian Estimation


options(mc.cores = parallel::detectCores())

# Model example
m <- brm(formula = bf(rating ~ pre_rating * task + (1|id)),
              data = fake_data,
              family = cumulative(link = "probit", threshold = "flexible"),
              sample_prior = TRUE,
              chains = 2,
              iter = 10000,
              cores = 8,
              warmup = 5000, 
              file = "test")  

  • Laptop: Lenovo Yoga 720-15IKB, Intel(R) Core™ i7-7700HQ CPU @ 2.80GHz 2.80 GHz, 16 GB RAM

  • Operating System: Windows 10 Home x64

  • R.version:
    platform x86_64-w64-mingw32
    arch x86_64
    os mingw32
    crt ucrt
    system x86_64, mingw32
    major 4
    minor 3.0
    year 2023
    month 04
    day 21
    svn rev 84292
    language R
    version.string R version 4.3.0 (2023-04-21 ucrt)
    nickname Already Tomorrow

  • R Studio Version
    [1] “desktop”
    [1] ‘2023.3.0.386’
    [1] “2023.03.0+386”
    [1] “Cherry Blossom”

  • brms Version: brms_2.19.0

R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

[1] LC_COLLATE=English_United Kingdom.utf8
[2] LC_CTYPE=English_United Kingdom.utf8
[3] LC_MONETARY=English_United Kingdom.utf8
[5] LC_TIME=English_United Kingdom.utf8

time zone: Europe/Warsaw
tzcode source: internal

attached base packages:
[1] parallel stats graphics grDevices utils datasets
[7] methods base

other attached packages:
[1] beepr_1.3 loo_2.6.0 brms_2.19.0 Rcpp_1.0.10
[5] sm_2.2-5.7.1 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0
[9] dplyr_1.1.2 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0
[13] tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0

loaded via a namespace (and not attached):
[1] gridExtra_2.3 inline_0.3.19 sandwich_3.0-2
[4] rlang_1.1.0 magrittr_2.0.3 multcomp_1.4-23
[7] matrixStats_0.63.0 compiler_4.3.0 mgcv_1.8-42
[10] callr_3.7.3 vctrs_0.6.2 reshape2_1.4.4
[13] pkgconfig_2.0.3 crayon_1.5.2 fastmap_1.1.1
[16] backports_1.4.1 ellipsis_0.3.2 utf8_1.2.3
[19] threejs_0.3.3 promises_1.2.0.1 markdown_1.6
[22] tzdb_0.3.0 nloptr_2.0.3 ps_1.7.5
[25] jsonlite_1.8.4 later_1.3.0 prettyunits_1.1.1
[28] R6_2.5.1 dygraphs_1.1.1.6 stringi_1.7.12
[31] StanHeaders_2.26.22 boot_1.3-28.1 estimability_1.4.1
[34] rstan_2.26.22 audio_0.1-10 zoo_1.8-12
[37] base64enc_0.1-3 bayesplot_1.10.0 httpuv_1.6.9
[40] Matrix_1.5-4 splines_4.3.0 igraph_1.4.2
[43] timechange_0.2.0 tidyselect_1.2.0 rstudioapi_0.14
[46] abind_1.4-5 codetools_0.2-19 miniUI_0.1.1.1
[49] curl_5.0.0 processx_3.8.1 pkgbuild_1.4.0
[52] lattice_0.21-8 plyr_1.8.8 shiny_1.7.4
[55] withr_2.5.0 bridgesampling_1.1-2 posterior_1.4.1
[58] coda_0.19-4 survival_3.5-5 RcppParallel_5.1.7
[61] xts_0.13.1 pillar_1.9.0 tensorA_0.36.2
[64] checkmate_2.1.0 DT_0.27 stats4_4.3.0
[67] shinyjs_2.1.0 distributional_0.3.2 generics_0.1.3
[70] hms_1.1.3 rstantools_2.3.1 munsell_0.5.0
[73] scales_1.2.1 minqa_1.2.5 gtools_3.9.4
[76] xtable_1.8-4 gamm4_0.2-6 glue_1.6.2
[79] emmeans_1.8.5 projpred_2.5.0 tools_4.3.0
[82] shinystan_2.6.0 lme4_1.1-32 colourpicker_1.2.0
[85] mvtnorm_1.1-3 grid_4.3.0 crosstalk_1.2.0
[88] colorspace_2.1-0 nlme_3.1-162 cli_3.6.1
[91] fansi_1.0.4 Brobdingnag_1.2-9 V8_4.3.0
[94] gtable_0.3.3 digest_0.6.31 TH.data_1.1-2
[97] htmlwidgets_1.6.2 farver_2.1.1 htmltools_0.5.5
[100] lifecycle_1.0.3 mime_0.12 shinythemes_1.2.0
[103] MASS_7.3-58.4

By default, brms uses at most one core per chain. Parallelizing across chains like this is embarrassingly parallel and will essentially always yield good speedup as long as you have enough memory. brms also contains functionality to parallelize within chains, but to use this you need to use the threads argument to brms::brm. The speedups here are more variable and sometimes this doesn’t help at all. For more, see Running brms models with within-chain parallelization