I am having trouble with parallelizing brms::kfold
.
I ran a model (using brm
) with chains = 4 and cores = 4 that took ~ 17 minutes.
I then ran:
k <- brms::kfold(fit, K = 10, folds = "stratified", group = "group",
chains = 4, cores = 4)
As specified here (Running folds in parallel on windows using kfold? · Issue #593 · paul-buerkner/brms · GitHub).
Which I assume should run the modeIs the same way as the original model (between chain parallelization since future
is not used, compared to either within chain or between fold parallelization): kfold.brmsfit function - RDocumentation). I get information saying “fitting model 1 of 10”, great. I expect each model to take about the same amount of time as the original but the second model does not start fitting until > 1 hour later.
I have also attempted
fit <- brms::add_criterion(fit, "kfold", K = 10, folds = "stratified",
group = "group", chains = 4, cores = 4)
with the same result.
As well as
options(mc.cores = 4)
k_grp <- loo::kfold_split_stratified(K = 10, x = df$group)
k <- rstanarm::kfold(fit, folds = k_grp, cores = 1)
as suggested here: K-fold cross-validation — kfold.stanreg • rstanarm, to fit models in sequential using 4 cores for each model (1 per a chain) which I believe was done with my original model (although with no chains argument, I am curious how this is handled) which took ~ 19 mins on average per a model.
I see that brms::kfold
uses the loo::kfold_helpers
when fold
and group
are specified but is it using rstanarm
for the actual fitting? It would be nice to stay within brms
if possible.
As a side note, there appears to be some disagreement between using cores for speeding up computation time between brms
and rstanarm
.
brms
says (https://cran.r-project.org/web/packages/brms/brms.pdf)
“Number of cores to use when executing the chains in parallel, which defaults to
1 but we recommend setting the mc.cores option to be as many processors as
the hardware and RAM allow (up to the number of chains).”
wheras rstanarm
says (K-fold cross-validation — kfold.stanreg • rstanarm)
" The Markov chains for each model will be run sequentially. This will often be the most efficient option, especially if many cores are available, but in some cases it may be preferable to fit the K
models sequentially and instead use the cores for the Markov chains."
I understand that one is related to fitting the model and one is doing k-fold but I would think the ideas should line up since k-fold is just fitting the model multiple times. According to Running brms models with within-chain parallelization, it is a bit more nuanced than sequentially often being most efficient (specifically “Within-chain parallelization is less efficient than between-chain parallelization”). I may be misinterpreting this though.
Any help would be greatly appreciated.
- Operating System: Windows 10
- brms: 2.16.1
- loo: 2.4.1
- rstanarm: 2.21.1