Within-chain parallelization with brms

Gang · October 15, 2020, 2:04am

I see the following suggestion in the vignette about within-chain parallelization using brms:

For a given Stan model one should usually choose the number of chains and the number of threads per chain to be equal to the number of (physical) cores one wishes to use.

Suppose I have 24 CPUs on a computer. I would choose 6 threads per chain for a model using 4 chains to take full advantage of potential speed gain. I expected to see that all the 24 CPUs would be active most of the time. However, it does not seem to be the case: when I checked the CPU usage, I rarely saw more than 4 CPUs were used.

If I want to simultaneously run 5 separate models, can I still choose 6 threads per chain for each of the 5 models? Is this considered hyper-threading?

wds15 · October 15, 2020, 7:59am

How large is your data set? The minimal grainsize enforced is 100 which limits parallelism in some cases with few data rows.

You should not fire off more Stan threads at the same time than you have physical CPU cores… but there can always be exceptions to this “rule of thumb”.

Gang · October 15, 2020, 12:45pm

There are 32005 rows and 4 columns in the dataset in the long format of the data.frame.

Could you elaborate how is grainsize defined?

In my test of a model with 4 chains and 8 threads per chain, I didn’t see more than 4 CPUs involved during the few times I checked the CPU usage. The runtime was 4 times shorter compared to the original job with no within-chain parallelization, and this is why I thought I might be able to run some simultaneous models through hyper-threading.

wds15 · October 15, 2020, 1:59pm

The default grainsize is max(100, data rows / (2 * # of threads requested)… I do not think you need to bother with this.

Getting a 4x speedup is very good with 8 threads per chains.

Just try out things. It‘s super complicated to tell people what to do here… do you want greatest throughput or shortest walltime for model runs will determine what you are going to do.

Topic		Replies	Views
Between and within chain parallelization: threads and cores for multi vs. hyperthreading brms cmdstanr , paralellization	3	1529	July 15, 2024
Within-chain parallelisation brms fitting-issues	1	699	November 19, 2021
Optimal num_stan_threads when using multiple chains General performance	5	1910	May 30, 2019
How do I set the `brms` arguments "threads" and "cores" correctly? brms	1	241	January 15, 2025
Reduce_sum cores, chains, threads Interfaces cmdstanr	13	1801	May 28, 2020

Within-chain parallelization with brms

Related topics