I’ve reviewed the documentation for parallelization within and between chains in brms but find it confusing - particularly the multithreading vignette: Running brms models with within-chain parallelization. I’m hoping you might be able to answer some questions to (dis)confirm my understanding.
To help me understand how arguments of brms functions work, let’s consider an example in which I have 12 physical cores and 24 logical cores and will always use 4 chains.
Case 1: Between-chain parallelization only
- Does
cores = chainsgenerally maximize performance?
My understanding is that settingcores > chainswill not offer improvement because each chain can only be run on one core. Accordingly, my 12 physical cores won’t offer improvement over a 4 physical core machine.
Case 2: Within-chain parallelization with backend = "cmdstanr"
- Will setting
cores > chainsfail to improve performance for the same reason as in the between-chain-only case? - If I want multithreading but not hyperthreading, I can set
12>=cores*kwherekis the scalar inthreads = threading(k)? Since between-chain parallelization is faster than within, I’ll pretty much always want to setcores = chainsand thereforek=3. This will use one thread per physical core but use all physical cores. However, maybe I misunderstandkand it is the number of logical cores per physical core to use: since each physical core has only two logical processors, settingk>2isn’t possible. - The most confusing part is when I want to use hyperthreading so that I use all 24 logical cores (or close to it) corresponding to my 12 physical cores.
(A) Does this requirecores > chains? Is this NOT guaranteed to be faster?
(B) What are the constraints as to how these values relate to each other? Must the constraint24 >= cores*kbe met? So now the decision would be between values such as e.g.cores = 6, thread = threading(4)andcores = 12, thread = threading(2)?