Loo::kfold How is the task parallelized?

I am runnig kfold() on a stanreg object with options()$mc.cores set to 10 and 12 cores on the machine. If I include a parameter “cores = 10” in the kfold() call, I get an error: “unused argument (cores = 10)”. When I leave out the unused argument, assuming that the function will check option()$mc.cores, kfold() now appears to run the 10 folds sequentially, rather than in parallel.

As far as I can judge by watching the use of cpu in TaskManager, each fold appears to use four cores. (initial % use is 10%, but there are 10 bumps up to 40%). I suppose that kfold is calling (sequentially) each fold, and using existing code for running the fold that is parallelized.

It seems to me that it would be faster to parallelize the code for kfold itself, calling the 10 folds, even if that requires telling the code that runs each fold to use only 1 core. Nowadays, many virtual desktops have many more than four cores available, and the default in kfold() is for 10 folds. Do I understand correctly what is happening?

Incidentally, I am using kfold because running loo::loo itself takes a looooooong time, even on a pretty fast machine with 12 cores and 64 Gbytes of main memory. I am running a stan_lmer model with 124 subjects each with 4 -8 times of measurement and only 2 or 3 main effect covariates. Is it typical for loo to take so long?

Larry Hunsicker

  • Operating System: Windows 10, updated
  • rstanarm Version: rstanarm 2,18,1

Hi Larry, sorry we missed this one and no one responded sooner, but you’re right that the parallelization of kfold should be done by fold. We haven’t done an rstanarm release in a while but we are planning one soon and if I have time before then I’ll change it so that cores can be used in the way you intended. If not then it will happen in the subsequent release (I just opened an issue on GitHub: https://github.com/stan-dev/rstanarm/issues/339).

1 Like

Just to include it in this thread that this is a known problem

EDIT: realized that it’s the same Lawrence.

I have a better version of k-fold parallelization working as part of this pull request

so it will probably get into the next release.

kfold now parallelizes by fold instead of by Markov chain (unless otherwise specified) and has its own documentation page. The new cores argument to kfold can be used as follows:

# Example code demonstrating the different ways to specify the number 
# of cores and how the cores are used

# starting with no mc.cores set
options(mc.cores = NULL)
 
# spread the K models over N_CORES cores (method 1, cores argument)
kfold(fit, K, cores = N_CORES)

# spread the K models over N_CORES cores (method 2, mc.cores option)
options(mc.cores = N_CORES)
kfold(fit, K)

# fit K models sequentially using N_CORES cores for the Markov chains each time
options(mc.cores = N_CORES)
kfold(fit, K, cores = 1)

On Windows it uses parallel::parLapply with a PSOCK cluster, and for Mac and Linux it uses parallel::mclapply.

Just came across this wondering about the same issue, although I am using brms.

I am running on a cluster and currently have 24 cores reserved, but it seems add_ic(m, ic='kfold') is also going through the models sequentially using only 4 cores. I have options(mc.cores=24) and options(loo.cores=24) at the top of my script.

Is there a way to do this in brms or, if not, are there any plans to make this change in brms as well @paul.buerkner ?

As of brms 2.8.0 (current CRAN version) you can use the future package to parallelize models in kfold and related models. See ?kfold.brmsfit for an example after updating brms.

1 Like