# Between and within chain parallelization: threads and cores for multi vs. hyperthreading

I’ve reviewed the documentation for parallelization within and between chains in brms but find it confusing - particularly the multithreading vignette: Running brms models with within-chain parallelization. I’m hoping you might be able to answer some questions to (dis)confirm my understanding.

To help me understand how arguments of brms functions work, let’s consider an example in which I have 12 physical cores and 24 logical cores and will always use 4 chains.

### Case 1: Between-chain parallelization only

1. Does cores = chains generally maximize performance?
My understanding is that setting cores > chains will not offer improvement because each chain can only be run on one core. Accordingly, my 12 physical cores won’t offer improvement over a 4 physical core machine.

### Case 2: Within-chain parallelization with backend = "cmdstanr"

1. Will setting cores > chains fail to improve performance for the same reason as in the between-chain-only case?
2. If I want multithreading but not hyperthreading, I can set 12>=cores*k where k is the scalar in threads = threading(k)? Since between-chain parallelization is faster than within, I’ll pretty much always want to set cores = chains and therefore k=3. This will use one thread per physical core but use all physical cores. However, maybe I misunderstand k and it is the number of logical cores per physical core to use: since each physical core has only two logical processors, setting k>2 isn’t possible.
3. The most confusing part is when I want to use hyperthreading so that I use all 24 logical cores (or close to it) corresponding to my 12 physical cores.
(A) Does this require cores > chains? Is this NOT guaranteed to be faster?
(B) What are the constraints as to how these values relate to each other? Must the constraint24 >= cores*k be met? So now the decision would be between values such as e.g. cores = 6, thread = threading(4) and cores = 12, thread = threading(2)?
1 Like