Speeding up CmdStanR by using more cores?

Hello, I am using CmdStanR on a Linux machine that has 20 cores. How do I speed things up to ensure CmdStanR is taking advantage of all the cores when it runs?

Is it the parallel_chains = getOption("mc.cores", 1)? What else do I need to do? Thanks

You have to use the parallel_chains argument when running mod$sample(...)

But that’s for parallel chains, and 20 would definitely be overkill. You should look into within-chain parallelism. Here’s the intro post on using reduce-sum in Stan code, and here’s a post on brms.

With 20 cores (and make sure those are physical cores and not logical), I’d do 5 parallel chains, each with 4 cores doing within-chain parallelism.

1 Like

Thanks. Is that all I need to do? What about elsewhere in R?

For simple parallel chains, that’s it, just supply a value to the parallel_chains argument reflecting the number of chains you want to run in parallel.

1 Like

Sorry for being redundant. But in my case, the machine has 20 cores. If I run 4 chains in the sampling, is it possible to use 5 cores per chain?

1 Like

Yup! Sorry, I should have clarified. Was just adding the recommendation as you posted your reply. It’s kindof a toss up between 4x5 and 5x4; There’s diminishing returns with within-chain parallelism, and I’ve always been uncomfortable with how few chains the default 4 is for computing rhat, hence my recommendation for more chains and fewer cores per chain.

1 Like

Thank you very much

Agree with everything @mike-lawrence says here, but also just want to emphasize that not all models are amenable to speedup with as many as 5 cores per chain. Some parts of the execution will not be parallelizable, and these will never benefit from additional cores per chain. The parallelizable parts will trade off against the parallelization overhead. At some point, the benefits of spinning up more cores will be outweighed by the costs (for example, copying all of the parameter values needed by each core). Before you set your model blazing away on 20 cores, run some very short chains to ensure that 20 is actually faster than 8 or 12.

2 Likes

Hi, this may be a R-specific question. I notice that the summary() function usually takes a bit of time. Is there something I can do to speed that up, again, by ideally taking advantage of more cores? Thanks.

posterior::summarise_draws() has an option to compute the summaries in parallel. If the draws array is really big this can be memory intensive, so be cautious about using 20 cores.

1 Like

Hi, do you know the right syntax for the following command? Thanks.

fit$draws(c(“beta”,“sigma”)) %>% summarise_draws(“mean”, “sd”, “rhat”, .cores=4)

I got an error:

Error: Cannot find function ‘4’.

I can’t immediately spot the error.
example_draws("eight_schools") %>% summarise_draws("mean", "median", "sd", "rhat", .cores = 2) works fine for me. Do you have a recent version of posterior installed? The .cores argument was added recently.

Also fit$summary() calls posterior::summarise_draws() internally so you should just be able to do this:

fit$summary(variables = c("beta", "sigma"), "mean" , "sd", "rhat", .cores = 4) 

Either way, like @jsocolar mentioned you’ll need a recent installation of posterior, and you might need the development version of posterior not the latest beta release. The easiest way to install that is

remotes::install_github("stan-dev/posterior")
2 Likes

Thank you! I was using version 0.1.5. The update works perfectly.

3 Likes

Hello. Sorry for keep coming back. Do you have an example using reduce-sum for a multi-level model? I don’t seem to come across it. Thank you.