I’ve reviewed the documentation for parallelization within and between chains in brms
but find it confusing - particularly the multithreading vignette: Running brms models with within-chain parallelization. I’m hoping you might be able to answer some questions to (dis)confirm my understanding.
To help me understand how arguments of brms
functions work, let’s consider an example in which I have 12 physical cores and 24 logical cores and will always use 4 chains.
Case 1: Between-chain parallelization only
- Does
cores = chains
generally maximize performance?
My understanding is that settingcores > chains
will not offer improvement because each chain can only be run on one core. Accordingly, my 12 physical cores won’t offer improvement over a 4 physical core machine.
Case 2: Within-chain parallelization with backend = "cmdstanr"
- Will setting
cores > chains
fail to improve performance for the same reason as in the between-chain-only case? - If I want multithreading but not hyperthreading, I can set
12>=cores*k
wherek
is the scalar inthreads = threading(k)
? Since between-chain parallelization is faster than within, I’ll pretty much always want to setcores = chains
and thereforek=3
. This will use one thread per physical core but use all physical cores. However, maybe I misunderstandk
and it is the number of logical cores per physical core to use: since each physical core has only two logical processors, settingk>2
isn’t possible. - The most confusing part is when I want to use hyperthreading so that I use all 24 logical cores (or close to it) corresponding to my 12 physical cores.
(A) Does this requirecores > chains
? Is this NOT guaranteed to be faster?
(B) What are the constraints as to how these values relate to each other? Must the constraint24 >= cores*k
be met? So now the decision would be between values such as e.g.cores = 6, thread = threading(4)
andcores = 12, thread = threading(2)
?