I’ve reviewed the documentation for parallelization within and between chains in
brms but find it confusing - particularly the multithreading vignette: Running brms models with within-chain parallelization. I’m hoping you might be able to answer some questions to (dis)confirm my understanding.
To help me understand how arguments of
brms functions work, let’s consider an example in which I have 12 physical cores and 24 logical cores and will always use 4 chains.
cores = chainsgenerally maximize performance?
My understanding is that setting
cores > chainswill not offer improvement because each chain can only be run on one core. Accordingly, my 12 physical cores won’t offer improvement over a 4 physical core machine.
- Will setting
cores > chainsfail to improve performance for the same reason as in the between-chain-only case?
- If I want multithreading but not hyperthreading, I can set
kis the scalar in
threads = threading(k)? Since between-chain parallelization is faster than within, I’ll pretty much always want to set
cores = chainsand therefore
k=3. This will use one thread per physical core but use all physical cores. However, maybe I misunderstand
kand it is the number of logical cores per physical core to use: since each physical core has only two logical processors, setting
- The most confusing part is when I want to use hyperthreading so that I use all 24 logical cores (or close to it) corresponding to my 12 physical cores.
(A) Does this require
cores > chains? Is this NOT guaranteed to be faster?
(B) What are the constraints as to how these values relate to each other? Must the constraint
24 >= cores*kbe met? So now the decision would be between values such as e.g.
cores = 6, thread = threading(4)and
cores = 12, thread = threading(2)?