Problem parallelizing rstanarm::loo()

I am running rstanarm and loo in RStudio 3.5.1 on a virtual desktop running Windows 10 with 12 (virtual) cores and 64 GBytes of main memory. options(mc.cores = parallel::detectCores()) is set, and I have confirmed that options()$mc.cores = 12. In I have computed a hierarchical repeated measures model using rstanarm::lmer with 2 or 3 main effects, 128 individuals, and about 2000 observations. The model converges well.

Then I ran loo 2.0.0 on the model. The function took over 3 hours to return. This did not seem consonant with the loo documentation that suggested that it was a rapid method to calculate elpd. So I reran the command and followed use of memory and cpu using TaskManager. It appears that the function is only using one core, as the fraction of cpu use is never higher than 8.3%. I retried this explicitly using the parameter cores = 12. The result was the same, as was the result running the function in native R. I have also repeated the experiment on my personal laptop. Same result.

Either I am making some sort of error that I canā€™t figure out, or there may be a bug in loo(). Have any of you had this problem? What might I be doing wrong? Thanks in advance to anyone that can help me with this.
Larry Hunsicker

Does it use all 12 cores if you first call

options(loo.cores = 12)

?

See loo 2.0 documentation Efficient approximate leave-one-out cross-validation (LOO) ā€” loo ā€¢ loo how to set the number of cores used. It is intentional that mc.cores option is not used directly as loo may sometimes take a lot of memory.

In what time?

How many posterior draws?

There will be soonish a faster loo, and your case would be great test case for it

The stan_lmer model returned in 5 1/4 minutes. There was no problem with the stan_lmer call. The call used 4 cores. My stan_lmer call used the rstanarm defaults. There were four chains, each of which returned 1000 draws after 1000 discarded burn-in draws.

More importantly, I think that I have figured out where the problem was. I thought that the problem might have been with my Windows version of loo(). I had had to compile it from source, since evidently the compiled version of loo hasnā€™t hit the CRAN repository yet. But I have a version of Ubuntu 16.04 available in an Oracle VBox, with R installed. So I tried loo out there. The call returned in 1 min 15 seconds!! But I noticed that I had assigned only one core to my VBox. So i returned to my Windows version, set options(mc.cores = NULL), and reran the call to loo(). Again, it returned in 1.25 minutes!! So there is no problem running loo in Windows so long as one uses only one core.

When I ran loo last night using 4 cores on my home machine, the function had not returned after 12 hours. TaskManager showed that the machine was still churning away at 100% cpu use, with a full use of core memory, sloping up to my 8 Gbytes, then dropping to near 0, then sloping up again over and over. NOW, HOWEVER, when I killed the loo() call within R, Task Manager showed that the memory was still in use, and cpu use was still 100%. It also showed that there were over 200 processes committed to R. This continued until I finally exited R itself. Then it took my home laptop several minutes to stop all the R processes and clean up memory.

It seems that there is a problem in the loo() multitasking code that (at least in Windows) is spawning and then losing the threads it has spawned. I havenā€™t coded any use of parallel myself, so I canā€™t guess what the specific issue is. But, clearly, loo works fine so long as it uses only one core, but gets into some sort of perpetual loop when it is called to use more than one core.

I hope that this helps.
Larry Hunsicker

1 Like

I misread your reply and tried to run loo() with a loo.cores = 12 option. As I am sure you know, this failed. Now that I have reread your message, Iā€™ll try setting options(loo.cores = 12) instead of putting it into the function call. But see my reply to Vehtari below. I think that there is a problem in the loo multicore coding. Thanks for your reply. And, really, many, many, thanks for rstanarm and loo. Itā€™s a great addition to statistics.

Incidentally, a reproducible example of this sort of behavior may be provided by the examples you have at the bottom of the loo() documentation.

fit1 <- stan_glm(mpg ~ wt, data= mtcars)
options(mc.cores=NULL)
stuff1<- loo(fit1)
stuff2 <- loo(fit1, mc.cores =2)
stuff3 <- loo(fit1, mc.cores =4)

On my 4 core Windows 10 laptop, stuff1 returns almost instantaneously.  stuff2 took 8 minutes.  Stuff3 uses 100% cpu time and has the same sawtooth use of memory that I described above,  It returned after 6.5 minutes.  But none of these failed to return the way that my example did. It seems as though calling too many cores leads to memory thrashing, and possibly that may lead to the ā€œnever returnā€ behavior I found running my bigger model.

Thanks for reporting with this simple example. I tested with Linux Ubuntu 16.04 and I donā€™t see any problem, so it seems to be a Windows problem.

Can you open issue at Issues Ā· stan-dev/loo Ā· GitHub, or at least report here a bit more details about your Windows, e.g. the information given by sessionInfo() or devtools::session_info('rstanarm')

iā€™ll open an issue as you suggest if I can figure out how to do that. Meanwhile, here are my sessioninfo() and devtools::session_info(ā€˜rstanarmā€™):

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] graphics grDevices utils datasets stats methods base

other attached packages:
[1] haven_1.1.2 loo_2.0.0 rstanarm_2.18.1 Rcpp_1.0.0
[5] survival_2.43-1 forcats_0.3.0 stringr_1.3.1 dplyr_0.7.7
[9] purrr_0.2.5 readr_1.1.1 tidyr_0.8.2 tibble_1.4.2
[13] ggplot2_3.1.0 tidyverse_1.2.1

loaded via a namespace (and not attached):
[1] nlme_3.1-137 matrixStats_0.54.0 xts_0.11-2
[4] lubridate_1.7.4 threejs_0.3.1 httr_1.3.1
[7] rstan_2.18.2 tools_3.5.1 backports_1.1.2
[10] R6_2.3.0 DT_0.5 lazyeval_0.2.1
[13] colorspace_1.3-2 withr_2.1.2 tidyselect_0.2.5
[16] gridExtra_2.3 prettyunits_1.0.2 processx_3.2.0
[19] compiler_3.5.1 cli_1.0.1 rvest_0.3.2
[22] xml2_1.2.0 shinyjs_1.0 colourpicker_1.0
[25] scales_1.0.0 dygraphs_1.1.1.6 ggridges_0.5.1
[28] callr_3.0.0 digest_0.6.18 StanHeaders_2.18.0
[31] minqa_1.2.4 base64enc_0.1-3 pkgconfig_2.0.2
[34] htmltools_0.3.6 lme4_1.1-18-1 htmlwidgets_1.3
[37] rlang_0.3.0.1 readxl_1.1.0 rstudioapi_0.8
[40] shiny_1.2.0 bindr_0.1.1 zoo_1.8-4
[43] jsonlite_1.5 crosstalk_1.0.0 gtools_3.8.1
[46] inline_0.3.15 magrittr_1.5 bayesplot_1.6.0
[49] Matrix_1.2-15 munsell_0.5.0 yaml_2.2.0
[52] stringi_1.2.4 debugme_1.1.0 MASS_7.3-51.1
[55] pkgbuild_1.0.2 plyr_1.8.4 grid_3.5.1
[58] parallel_3.5.1 promises_1.0.1 crayon_1.3.4
[61] miniUI_0.1.1.1 lattice_0.20-38 splines_3.5.1
[64] hms_0.4.2 ps_1.2.1 pillar_1.3.0
[67] igraph_1.2.2 markdown_0.8 shinystan_2.5.0
[70] codetools_0.2-15 reshape2_1.4.3 stats4_3.5.1
[73] rstantools_1.5.1 glue_1.3.0 modelr_0.1.2
[76] nloptr_1.2.1 httpuv_1.4.5 cellranger_1.1.0
[79] gtable_0.2.0 assertthat_0.2.0 mime_0.6
[82] xtable_1.8-3 broom_0.5.0 later_0.7.5
[85] rsconnect_0.8.8 shinythemes_1.1.2 bindrcpp_0.2.2

devtools::session_info(ā€˜rstanarmā€™)

  • Session info ------------------------------------------------------------
    setting value
    version R version 3.5.1 (2018-07-02)
    os Windows >= 8 x64
    system x86_64, mingw32
    ui RStudio
    language (EN)
    collate English_United States.1252
    ctype English_United States.1252
    tz America/Chicago
    date 2018-11-10

  • Packages ----------------------------------------------------------------
    ! package * version date lib source
    assertthat 0.2.0 2017-04-11 [1] CRAN (R 3.5.0)
    backports 1.1.2 2017-12-13 [1] CRAN (R 3.5.0)
    base64enc 0.1-3 2015-07-28 [1] CRAN (R 3.5.0)
    bayesplot 1.6.0 2018-08-02 [1] CRAN (R 3.5.1)
    BH 1.66.0-1 2018-02-13 [1] CRAN (R 3.5.0)
    bindr 0.1.1 2018-03-13 [1] CRAN (R 3.5.0)
    bindrcpp 0.2.2 2018-03-29 [1] CRAN (R 3.5.0)
    bitops 1.0-6 2013-08-17 [1] CRAN (R 3.5.0)
    callr 3.0.0 2018-08-24 [1] CRAN (R 3.5.1)
    cli 1.0.1 2018-09-25 [1] CRAN (R 3.5.1)
    colorspace 1.3-2 2016-12-14 [1] CRAN (R 3.5.0)
    colourpicker 1.0 2017-09-27 [1] CRAN (R 3.5.0)
    crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.0)
    crosstalk 1.0.0 2016-12-21 [1] CRAN (R 3.5.0)
    desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.0)
    digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.1)
    dplyr * 0.7.7 2018-10-16 [1] CRAN (R 3.5.1)
    DT 0.5 2018-11-05 [1] CRAN (R 3.5.1)
    dygraphs 1.1.1.6 2018-07-11 [1] CRAN (R 3.5.1)
    fansi 0.4.0 2018-10-05 [1] CRAN (R 3.5.1)
    ggplot2 * 3.1.0 2018-10-25 [1] CRAN (R 3.5.1)
    ggridges 0.5.1 2018-09-27 [1] CRAN (R 3.5.1)
    glue 1.3.0 2018-07-17 [1] CRAN (R 3.5.1)
    gridExtra 2.3 2017-09-09 [1] CRAN (R 3.5.0)
    gtable 0.2.0 2016-02-26 [1] CRAN (R 3.5.0)
    gtools 3.8.1 2018-06-26 [1] CRAN (R 3.5.0)
    htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.5.0)
    htmlwidgets 1.3 2018-09-30 [1] CRAN (R 3.5.1)
    httpuv 1.4.5 2018-07-19 [1] CRAN (R 3.5.1)
    igraph 1.2.2 2018-07-27 [1] CRAN (R 3.5.1)
    inline 0.3.15 2018-05-18 [1] CRAN (R 3.5.0)
    jsonlite 1.5 2017-06-01 [1] CRAN (R 3.5.0)
    labeling 0.3 2014-08-23 [1] CRAN (R 3.5.0)
    later 0.7.5 2018-09-18 [1] CRAN (R 3.5.1)
    lattice 0.20-38 2018-11-04 [1] CRAN (R 3.5.1)
    lazyeval 0.2.1 2017-10-29 [1] CRAN (R 3.5.0)
    lme4 1.1-18-1 2018-08-17 [1] CRAN (R 3.5.1)
    loo * 2.0.0 2018-04-11 [1] CRAN (R 3.5.0)
    magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.0)
    markdown 0.8 2017-04-20 [1] CRAN (R 3.5.0)
    MASS 7.3-51.1 2018-11-01 [1] CRAN (R 3.5.1)
    Matrix 1.2-15 2018-11-01 [1] CRAN (R 3.5.1)
    matrixStats 0.54.0 2018-07-23 [1] CRAN (R 3.5.1)
    mgcv 1.8-25 2018-10-26 [1] CRAN (R 3.5.1)
    mime 0.6 2018-10-05 [1] CRAN (R 3.5.1)
    miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 3.5.0)
    minqa 1.2.4 2014-10-09 [1] CRAN (R 3.5.0)
    munsell 0.5.0 2018-06-12 [1] CRAN (R 3.5.0)
    nlme 3.1-137 2018-04-07 [1] CRAN (R 3.5.0)
    nloptr 1.2.1 2018-10-03 [1] CRAN (R 3.5.1)
    packrat 0.4.9-3 2018-06-01 [1] CRAN (R 3.5.0)
    pillar 1.3.0 2018-07-14 [1] CRAN (R 3.5.1)
    pkgbuild 1.0.2 2018-10-16 [1] CRAN (R 3.5.1)
    pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.5.1)
    PKI 0.1-5.1 2017-09-16 [1] CRAN (R 3.5.0)
    plogr 0.2.0 2018-03-25 [1] CRAN (R 3.5.0)
    plyr 1.8.4 2016-06-08 [1] CRAN (R 3.5.0)
    prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.0)
    processx 3.2.0 2018-08-16 [1] CRAN (R 3.5.1)
    promises 1.0.1 2018-04-13 [1] CRAN (R 3.5.0)
    ps 1.2.1 2018-11-06 [1] CRAN (R 3.5.1)
    purrr * 0.2.5 2018-05-29 [1] CRAN (R 3.5.0)
    R6 2.3.0 2018-10-04 [1] CRAN (R 3.5.1)
    RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 3.5.0)
    Rcpp * 1.0.0 2018-11-07 [1] CRAN (R 3.5.1)
    RcppEigen 0.3.3.4.0 2018-02-07 [1] CRAN (R 3.5.0)
    RCurl 1.98-0 2018-04-28 [1] local
    reshape2 1.4.3 2017-12-11 [1] CRAN (R 3.5.0)
    R RJSONIO [?]
    rlang 0.3.0.1 2018-10-25 [1] CRAN (R 3.5.1)
    rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.0)
    rsconnect 0.8.8 2018-03-09 [1] CRAN (R 3.5.0)
    rstan 2.18.2 2018-11-07 [1] CRAN (R 3.5.1)
    rstanarm * 2.18.1 2018-10-21 [1] CRAN (R 3.5.1)
    rstantools 1.5.1 2018-08-22 [1] CRAN (R 3.5.1)
    rstudioapi 0.8 2018-10-02 [1] CRAN (R 3.5.1)
    scales 1.0.0 2018-08-09 [1] CRAN (R 3.5.1)
    shiny 1.2.0 2018-11-02 [1] CRAN (R 3.5.1)
    shinyjs 1.0 2018-01-08 [1] CRAN (R 3.5.0)
    shinystan 2.5.0 2018-05-01 [1] CRAN (R 3.5.0)
    shinythemes 1.1.2 2018-11-06 [1] CRAN (R 3.5.1)
    sourcetools 0.1.7 2018-04-25 [1] CRAN (R 3.5.0)
    StanHeaders 2.18.0 2018-10-07 [1] CRAN (R 3.5.1)
    stringi 1.2.4 2018-07-20 [1] CRAN (R 3.5.1)
    stringr * 1.3.1 2018-05-10 [1] CRAN (R 3.5.0)
    survival * 2.43-1 2018-10-29 [1] CRAN (R 3.5.1)
    threejs 0.3.1 2017-08-13 [1] CRAN (R 3.5.0)
    tibble * 1.4.2 2018-01-22 [1] CRAN (R 3.5.0)
    tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.1)
    utf8 1.1.4 2018-05-24 [1] CRAN (R 3.5.0)
    viridisLite 0.3.0 2018-02-01 [1] CRAN (R 3.5.0)
    withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.0)
    xtable 1.8-3 2018-08-29 [1] CRAN (R 3.5.1)
    xts 0.11-2 2018-11-05 [1] CRAN (R 3.5.1)
    yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.1)
    zoo 1.8-4 2018-09-19 [1] CRAN (R 3.5.1)

[1] C:/Larry/R/win-library/3.5
[2] C:/Program Files/R/R-3.5.1/library

R ā€“ Package was removed from disk.

If you want to learn, go to this address Issues Ā· stan-dev/loo Ā· GitHub
click ā€œNew issueā€ and copy that message with mtcars example and the session info.
Just tell if you prefer that Iā€™ll do it

Sorry to take so long to get back to you. I now have a reproducible example of the problem. It took me quite a bit of time to track the problem down. It turns out that the loo() multitasking breaks down in Windows 10 when the .rprofile file contains the line:
options(mc.cores = parallel::detectCores())
Oddly, adding this line at the start of a script doesnā€™t cause loo to fail with multitasking. Thatā€™s one reason that it took me so long to isolate the problem. Odd.
Try running the following in a clean RStudio session on a Windows 10 machine:

library(rstanarm)
sleepstudy <- lme4::sleepstudy
fm1 <- stan_lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)
options('mc.cores' = NULL)
system.time(loo(fm1))
options('mc.cores' = 2)
system.time(loo(fm1))

On my Windows 10 machine, if there is no options(mc.cores) line in the .rprofile file, the above runs successfully, though the second call with mc.cores set to 2 runs very slowly. (Once you have run lines 2:3 once and ā€œsleepstudyā€ and ā€œfm1ā€ are in your environment, you can comment them out to speed things up.) But when the options(mc.cores) line is in .rprofile, the second loo() run never returns. If you look what is happening using the Task Manager, you will see that the cpu never stops churning. If you halt the command by pressing Esc, R returns you to the > prompt. But Task Manager shows that the churning has continued and continues until you completely exit R. I have now confirmed this on two Windows 10 machines ā€“ my home laptop and my UIowa 12 cpu virtual desktop. I have also confirmed your finding that this does not happen when I run the above in my VBox virtual Ubuntu machine. I have no idea why this is happening. I thought that it might be that the search() order was different, but they were exactly the same when I had the options(mc.cores) line in .rprofile or added to the beginning of the above script.
I have noticed a batch of other issues with rstanarm::loo. But Iā€™ll put these in different notes to help keep the various issues separate. I hope that this reproducible example will help you find the bug. Iā€™ll review how to submit an ā€œissueā€ as you have suggested. But it may take me a day or so to learn how to do that. Let me know how the above works for you.
Larry Hunsicker

2 Likes

Thanks for the detailed information. Can you test this in some other Windows than 10? Or can someone else reading this thread test this?

No. I have only my laptop and my access to the UIowa Research Desktop. But I suspect that things will be the same on other versions of Windows. As I am sure that you know, Windows does not do well with parallel, because Windows canā€™t fork a process the way that all the *nix systems do. One has to create a formal cluster, using Snow or one of the other packages. However stan_lmer and the other rstanarm programs seem to have solved the parallel problems.
Because rstanarm didnā€™t have a Windows compiled package for a long time, I had to compile my copy from source. Now that there is a precompiled Windows version of rstanarm, do you suppose that I should reinstall rstanarm? Perhaps there was something odd that my version of rtools didnā€™t handle well.
Larry Hunsicker

Reinstalling rstanarm using the now available compiled package for Windows made no difference. The behavior is exactly as described above.

Is using more than one core for the loo (2.6.0) function still a known issue on Windows?
I tested the example Lawrence_Hunsicker mentioned on win 10 and 11 and got the same result that it takes much longer with cores=2.