Hello, I am using CmdStanR on a Linux machine that has 20 cores. How do I speed things up to ensure CmdStanR is taking advantage of all the cores when it runs?
Is it the parallel_chains = getOption("mc.cores", 1)? What else do I need to do? Thanks
You have to use the parallel_chains argument when running mod$sample(...)
But that’s for parallel chains, and 20 would definitely be overkill. You should look into within-chain parallelism. Here’s the intro post on using reduce-sum in Stan code, and here’s a post on brms.
With 20 cores (and make sure those are physical cores and not logical), I’d do 5 parallel chains, each with 4 cores doing within-chain parallelism.
For simple parallel chains, that’s it, just supply a value to the parallel_chains argument reflecting the number of chains you want to run in parallel.
Yup! Sorry, I should have clarified. Was just adding the recommendation as you posted your reply. It’s kindof a toss up between 4x5 and 5x4; There’s diminishing returns with within-chain parallelism, and I’ve always been uncomfortable with how few chains the default 4 is for computing rhat, hence my recommendation for more chains and fewer cores per chain.
Agree with everything @mike-lawrence says here, but also just want to emphasize that not all models are amenable to speedup with as many as 5 cores per chain. Some parts of the execution will not be parallelizable, and these will never benefit from additional cores per chain. The parallelizable parts will trade off against the parallelization overhead. At some point, the benefits of spinning up more cores will be outweighed by the costs (for example, copying all of the parameter values needed by each core). Before you set your model blazing away on 20 cores, run some very short chains to ensure that 20 is actually faster than 8 or 12.
Hi, this may be a R-specific question. I notice that the summary() function usually takes a bit of time. Is there something I can do to speed that up, again, by ideally taking advantage of more cores? Thanks.
posterior::summarise_draws() has an option to compute the summaries in parallel. If the draws array is really big this can be memory intensive, so be cautious about using 20 cores.
I can’t immediately spot the error. example_draws("eight_schools") %>% summarise_draws("mean", "median", "sd", "rhat", .cores = 2) works fine for me. Do you have a recent version of posterior installed? The .cores argument was added recently.
Either way, like @jsocolar mentioned you’ll need a recent installation of posterior, and you might need the development version of posterior not the latest beta release. The easiest way to install that is