Between and within chain parallelization: threads and cores for multi vs. hyperthreading

rwilcom · July 31, 2022, 9:51pm

I’ve reviewed the documentation for parallelization within and between chains in brms but find it confusing - particularly the multithreading vignette: Running brms models with within-chain parallelization. I’m hoping you might be able to answer some questions to (dis)confirm my understanding.

To help me understand how arguments of brms functions work, let’s consider an example in which I have 12 physical cores and 24 logical cores and will always use 4 chains.

Case 1: Between-chain parallelization only

Does cores = chains generally maximize performance?
My understanding is that setting cores > chains will not offer improvement because each chain can only be run on one core. Accordingly, my 12 physical cores won’t offer improvement over a 4 physical core machine.

Case 2: Within-chain parallelization with `backend = "cmdstanr"`

Will setting cores > chains fail to improve performance for the same reason as in the between-chain-only case?
If I want multithreading but not hyperthreading, I can set 12>=cores*k where k is the scalar in threads = threading(k)? Since between-chain parallelization is faster than within, I’ll pretty much always want to set cores = chains and therefore k=3. This will use one thread per physical core but use all physical cores. However, maybe I misunderstand k and it is the number of logical cores per physical core to use: since each physical core has only two logical processors, setting k>2 isn’t possible.
The most confusing part is when I want to use hyperthreading so that I use all 24 logical cores (or close to it) corresponding to my 12 physical cores.
(A) Does this require cores > chains? Is this NOT guaranteed to be faster?
(B) What are the constraints as to how these values relate to each other? Must the constraint24 >= cores*k be met? So now the decision would be between values such as e.g. cores = 6, thread = threading(4) and cores = 12, thread = threading(2)?

cour10eygrace · July 10, 2024, 11:54pm

I am having very similar questions. Have you gotten any replies or figured this out on your own @rwilcom? I have read the linked article (Running brms models with within-chain parallelization) but still not totally clear on whether cores > chains will speed things up and/or if threading must be < than cores/chains. Thanks!!

Ax3man · July 12, 2024, 7:12pm

Hey Courtney. The number of threads is the number per “core”. So if you have 4 chains, you can set cores = 4 and threads = threading(2) to use 8 threads in total.

cour10eygrace · July 15, 2024, 8:20pm

thanks Wouter!

Topic		Replies	Views
Within-chain parallelization with brms brms	3	1355	October 15, 2020
How do I set the `brms` arguments "threads" and "cores" correctly? brms	1	241	January 15, 2025
Within-chain parallelisation brms fitting-issues	1	699	November 19, 2021
Brms seems to omit option "cores" while fitting model brms fitting-issues , paralellization	1	1129	April 29, 2023
Reduce_sum cores, chains, threads Interfaces cmdstanr	13	1800	May 28, 2020

Between and within chain parallelization: threads and cores for multi vs. hyperthreading

Case 1: Between-chain parallelization only

Case 2: Within-chain parallelization with backend = "cmdstanr"

Related topics

Case 2: Within-chain parallelization with `backend = "cmdstanr"`