I am trying to use the new reduce_sum functionality in brms (which is a huge, appreciated upgrade!), but am getting a non-intuitive (to me) error message.
The model I am trying to fit is complex and has taken weeks to run using brms in the past. A few weeks ago when I learned about the new reduce_sum functionality, I amended the model to leverage it and started it again. The model finished compiling at the 3 week mark, but did not return the expected brms object; rather, I received this disappointing error message:
The model has successfully run to completion with brms before using across-chain parallelization (but not within-chain parallelization), and I can run the model using the epilepsy data set in the brms vignette with within-chain parallelization successfully.
Speeding up a bernoulli logit type model is hard with reduce_sum and things depend on details if you actually gain. So you should try out if your model really speeds up using a sub-sampled data-set.
Sorry to hear that your long run simply crashed. I would suggest you download the more recent version of brms from CRAN or you even go with the github brms as there were a few fixes for reduce_sum. It‘s still odd to hear that the model ran for a long time just fine and then crashed.
Since you are on Windows and you are struggling with runtime… you may want to consider using the WSL emulation of Linux and run things in that envirnoment (its still under Windws). People have reported significant speedups doing that.
Thanks for the advice! The runtime speed-up was substantial with reduce_sum (~1 month with 3000 iterations with 4 chains vs. 3 weeks with 4000 iteractions with 4 chains and 3 cores per chain), it was just disappointing to crash at the end.
I’ll try the developer version of brms first and report back if I still have an issue.
I ran the same model for a small number of iterations (200) with across- and within-chain parallelization. The model again finished running, but yielded the following error message and did not return a brms object:
Error: Supplied CSV file is corrupt!
I still have the R session open.
For parameters, the model has in excess of 100k with a little over 300k observations.
Can you run tempdir(), you should see something like "C:\\Users\\Rok\\AppData\\Local\\Temp\\RtmpWWF8hi" and check the folder above the reported one. In my case "C:\\Users\\Rok\\AppData\\Local\\Temp\\. Check if any of the subfolder have any recently generated .csv files.
I updated my brms install to the latest developer version and tried a short run (200 iterations) again. The model again ran but, when finished, provided the same error:
Popping back in to provide an update: I followed @wds15’s suggestion to try WSL, and the model successfully executed (still not quite converged, but progress)! So, I assume there is an issue with Windows per @rok_cesnovar’s comment?