Speeding up CmdStanR by using more cores?

sonicking · June 23, 2021, 6:17pm

Hello, I am using CmdStanR on a Linux machine that has 20 cores. How do I speed things up to ensure CmdStanR is taking advantage of all the cores when it runs?

Is it the parallel_chains = getOption("mc.cores", 1)? What else do I need to do? Thanks

mike-lawrence · June 23, 2021, 6:35pm

You have to use the parallel_chains argument when running mod$sample(...)

But that’s for parallel chains, and 20 would definitely be overkill. You should look into within-chain parallelism. Here’s the intro post on using reduce-sum in Stan code, and here’s a post on brms.

With 20 cores (and make sure those are physical cores and not logical), I’d do 5 parallel chains, each with 4 cores doing within-chain parallelism.

sonicking · June 23, 2021, 6:39pm

Thanks. Is that all I need to do? What about elsewhere in R?

mike-lawrence · June 23, 2021, 6:41pm

For simple parallel chains, that’s it, just supply a value to the parallel_chains argument reflecting the number of chains you want to run in parallel.

sonicking · June 23, 2021, 6:43pm

Sorry for being redundant. But in my case, the machine has 20 cores. If I run 4 chains in the sampling, is it possible to use 5 cores per chain?

mike-lawrence · June 23, 2021, 6:43pm

Yup! Sorry, I should have clarified. Was just adding the recommendation as you posted your reply. It’s kindof a toss up between 4x5 and 5x4; There’s diminishing returns with within-chain parallelism, and I’ve always been uncomfortable with how few chains the default 4 is for computing rhat, hence my recommendation for more chains and fewer cores per chain.

sonicking · June 23, 2021, 6:44pm

Thank you very much

jsocolar · June 23, 2021, 6:52pm

Agree with everything @mike-lawrence says here, but also just want to emphasize that not all models are amenable to speedup with as many as 5 cores per chain. Some parts of the execution will not be parallelizable, and these will never benefit from additional cores per chain. The parallelizable parts will trade off against the parallelization overhead. At some point, the benefits of spinning up more cores will be outweighed by the costs (for example, copying all of the parameter values needed by each core). Before you set your model blazing away on 20 cores, run some very short chains to ensure that 20 is actually faster than 8 or 12.

sonicking · June 23, 2021, 7:03pm

Hi, this may be a R-specific question. I notice that the summary() function usually takes a bit of time. Is there something I can do to speed that up, again, by ideally taking advantage of more cores? Thanks.

jsocolar · June 23, 2021, 7:12pm

posterior::summarise_draws() has an option to compute the summaries in parallel. If the draws array is really big this can be memory intensive, so be cautious about using 20 cores.

sonicking · June 23, 2021, 8:48pm

Hi, do you know the right syntax for the following command? Thanks.

fit$draws(c(“beta”,“sigma”)) %>% summarise_draws(“mean”, “sd”, “rhat”, .cores=4)

I got an error:

Error: Cannot find function ‘4’.

jsocolar · June 23, 2021, 8:55pm

I can’t immediately spot the error.
example_draws("eight_schools") %>% summarise_draws("mean", "median", "sd", "rhat", .cores = 2) works fine for me. Do you have a recent version of posterior installed? The .cores argument was added recently.

jonah · June 23, 2021, 9:13pm

Also fit$summary() calls posterior::summarise_draws() internally so you should just be able to do this:

fit$summary(variables = c("beta", "sigma"), "mean" , "sd", "rhat", .cores = 4)

Either way, like @jsocolar mentioned you’ll need a recent installation of posterior, and you might need the development version of posterior not the latest beta release. The easiest way to install that is

remotes::install_github("stan-dev/posterior")

sonicking · June 23, 2021, 10:40pm

Thank you! I was using version 0.1.5. The update works perfectly.

sonicking · June 24, 2021, 7:44pm

Hello. Sorry for keep coming back. Do you have an example using reduce-sum for a multi-level model? I don’t seem to come across it. Thank you.

Richard_Erickson · April 1, 2024, 6:41pm

I was browsing the forum and saw this post. I’ve created example occupancy models that are multilevel both biologically and statistically.

Here’s the link to the Stan models inst · main · UMESC / quant-ecology / occStanhm · GitLab (usgs.gov), specifically see the these two functions inst/occupancy_3.stan · main · UMESC / quant-ecology / occStanhm · GitLab (usgs.gov) and inst/occstanhm_2.stan · main · UMESC / quant-ecology / occStanhm · GitLab (usgs.gov).

Topic		Replies	Views
Parallelization in Stan's models General rstan	5	125	May 19, 2025
Advice for parallelizing many Stan models with multiple chains Modeling	1	621	September 20, 2022
Reduce_sum cores, chains, threads Interfaces cmdstanr	13	1794	May 28, 2020
Speed up the Rstan run RStan	1	1036	September 4, 2019
Parallel with cmdstanr? Interfaces cmdstanr	8	1877	April 25, 2020

Speeding up CmdStanR by using more cores?

Related topics