I am runnig kfold() on a stanreg object with options()$mc.cores set to 10 and 12 cores on the machine. If I include a parameter “cores = 10” in the kfold() call, I get an error: “unused argument (cores = 10)”. When I leave out the unused argument, assuming that the function will check option()$mc.cores, kfold() now appears to run the 10 folds sequentially, rather than in parallel.
As far as I can judge by watching the use of cpu in TaskManager, each fold appears to use four cores. (initial % use is 10%, but there are 10 bumps up to 40%). I suppose that kfold is calling (sequentially) each fold, and using existing code for running the fold that is parallelized.
It seems to me that it would be faster to parallelize the code for kfold itself, calling the 10 folds, even if that requires telling the code that runs each fold to use only 1 core. Nowadays, many virtual desktops have many more than four cores available, and the default in kfold() is for 10 folds. Do I understand correctly what is happening?
Incidentally, I am using kfold because running loo::loo itself takes a looooooong time, even on a pretty fast machine with 12 cores and 64 Gbytes of main memory. I am running a stan_lmer model with 124 subjects each with 4 -8 times of measurement and only 2 or 3 main effect covariates. Is it typical for loo to take so long?
- Operating System: Windows 10, updated
- rstanarm Version: rstanarm 2,18,1