RStudio crashes working with brms

I’m dipping my toes into the brms package, currently working with ordinal regression. Unfortunately, I’m experiencing a lot of crashes. Here is a screen dump from the latest one:


This occurred when applying loo(fitt2), where fitt2 is the fitted model as conveniently shown in in the screen dump after crashing, but previously the same occurred just after fitt2 was fitted. Not sure if it’s related, but I consistently get the following warning message after fitting:

Warning message:
In system(paste(CXX, ARGS), ignore.stdout = TRUE, ignore.stderr = TRUE) :
  'C:/rtools40/usr/mingw_/bin/g++' not found

Info about the data: The data is questionnaire data with ordered responses (5 levels) regarding farmers’ perceptions of innovation needs in response to wildlife damage. 222 farmers (ID) were asked 6 questions (Q), and the data frame DF contains 1332 observations. These first two analyses are merely trying to rank the responses to the questions by their coefficients, with (fitt2) and without (fitt) accounting for farmer ID random effects.

Any ideas what’s going on?

So the crash doesn’t occur during sampling, just when you call loo on the end result?

Does the crash occur when you don’t use multiple cores? (i.e., loo(fitt2, cores=1))

That warning message is safe to ignore

The crash has not occurred during sampling. It has occurred just as the chains are done (couldn’t capture the exact moment). Other times that works just fine. The crash only occasionally crashes when running loo.

How much RAM does your computer have and how much gets used during sampling and the loo() call? Since running out of RAM can cause R to crash

The computer has 16 GB RAM. A lot seems to be used by other things are running in the background, but there is at least 4 GB available when sampling and 5GB when looing. But I guess it’s very possible something else was running in the background that used up a lot of memory. Either way, I can no longer get it to crash whatever I do. Which is a good thing. Will come crying again if the problem reoccurs.

You’re not alone. I had the same idiosyncratic crashing with various versions of RStudio and numerous attempts to clean install. Usuallly was associated with a predict, fitted, or pp_check call once the model had finished. I suspect its some interaction between RStan and RStudio, but never was able to get diagnostic info. My “solution” was to revert to R 3.6.3 and RStan 2.19, and its working great.

Similarly with me. When using package brms, at apparently random points during post-processing of the fitted model – whether its pairs() plots, pp_check()'s or other access of the posterior sample – I get the old-timey-bomb “R Session Aborted” message in RStudio. I’m running R v.4.0.2, brms v.2.13.5, rstan v.2.21.2. The upshot is that now brms is unusable. Is the only solution to revert to earlier R v.3.x and earlier versions of brms & rstan?

This is related to the issue https://github.com/stan-dev/rstan/issues/844. See also

I had/have the same problem. It crashed when I fitted several (complex) models in a row. I found the following workaround which reduced the crashing signficantly. Instead of using the function brm directly, you use a combination of make_standata, sampling and then fit an empty brm_model via brm.

sdata <- make_standata(brmsformula, data = brmsdata)
stanfit <- sampling(stanmod, data = sdata, warmup=warmup, iter=iter, chains=chains, thin=thin)
brms_fit <- brm(brmsformula, prior = priors, data = brmsdata, empty = TRUE)
brms_fit$fit <- stanfit
brms_fit <- rename_pars(brms_fit)

This method is described in the manual on page 27 (https://cran.r-project.org/web/packages/brms/brms.pdf)

If it crashes when running loo, you can set the argument pointwise to TRUE e.g.

loo(fit1, fit2, pointwise = TRUE)

This will be slower, but much preferred to having R crash. I’ve had to do this many times when comparing large model fits.

I share my recent experience with this problem, may be it helps finding a solution:

I recently updated from R3.6.2 to R4.0.3, and updated RStudio to 1.3.1093. I deleted the C:rtools (don’t know the exact folder name it was) and got the Rtools40. brm run ok, but afterwards RStudio not always, but often crashed e.g. with fitted(mod), resid(mod), loo(mod).
So, I linked my RStudio with R3.6.2, deleted the Rtools40, installed another RTools version (“invited” by R when I wanted to run my first brm-model), now it is in C:RBuildTools/3.5. No RSession abort since then any more.
Hope that some day it will work with R4.x.x

thanks!

Has a solution to this problem (other than reverting to R 3.xxx) been found yet? I’m getting exactly this problem - it happens post sampling, typically in posterior predictive checks.

Just to be clear, has anyone had issues like this in setups other than Windows + RStudio? Does it happen in R in the terminal?
(or RGui, or in any other operating system?)

I can’t confirm that it hasn’t happened on a Linux + R virtual machine: all I know is that sometimes when I log back into Google’s Cloud engine and screen -r back to my R session, R has terminated and the remaining code to be run was entered into the Linux terminal.

As far as Windows + R versus RStudio, I don’t think the same issue happens in plain R. I’ve been running some models for coming up on a day now without R shutting down; however, I’m still running into R not doing what I want it to do. I think it has something to do with the way that rstan is setting up parallel cores.

Current Attempts & Results
I’m fitting multiple 2PL IRT models using brms, and I’m using a loo_compare() to compare the various model specifications. As a result, after each model is compiled and sampled, I call add_criterion(..., criterion = "loo"). Since I’m just wanting to get all the models run right now (and I’ve already fit all the models on a different dataset), I’m following up each add_criterion() with the next model fit, so the environment looks something like this:

fit1 <- brm(...)
fit1 <- add_criterion(fit1, criterion = "loo")
fit2 <- brm(...)
fit2 <- add_criterion(fit2, criterion = "loo")

and so on for 18 different models. After the first model was fit, the call to add the LOOIC produced the following error output:

Error in serialize(data, node$con) : error writing to connection
Error in serialize(data, node$con) : error writing to connection

The next model then compiled and sampled without any issues, but when it finished, I got 10 of the following warnings:

In for (i in 1:codeCount) { :
closing unused connection # (<-LAPTOP-...:11781)

Then the call to add_criterion(...) resulted in the same error in serialize issue. The remaining models all compile and sample (so far) without additional errors or warnings, but every add_criterion(...) fails with the same double printed serialization error.

Past Attempts & Results
When using Rstudio for the initial trials of these models, I’d experienced the same issues of seemingly random crashes. My experience is that the crashes have nothing to do with the complexity of models but more with the number of models or calls to post-processes that are informed by options(mc.cores = parallel::detectCores).

I suspected that this may have something to do with memory demands and starting using gc() more often to help things, but I found that gc() also very frequently would result in an Rstudio crash or less frequently would print out the closing unused connection # warnings. No idea what about this actually works, but I started doing the following when having to run multiple models in Rstudio:

library(brms)

options(mc.cores = 4) #specifically avoiding the call to parallel::detectCores() 
fit1 <- brm(...)

options(mc.cores = 1)
save("...file path...")
gc()

options(mc.cores = 1)
fit1 <- add_criterion(fit1, criterion = "loo")
save("...file path...")
gc()

options(mc.cores = 4)
fit2 <- brm(...)

and so on. This seemed to avoid the random crashes, but when I would start to check the models with pp_check(), the crashes would resume. Again, the crashes seem more linked to the number of times that certain functions are called rather than the complexity of the models. This is very outside of any of my expertise, but it seems like the issue is related to both parallel operations and memory rather than either one independently. I’m not sure why using save(...) seems to avoid the gc() crashes, but it did in my experience and typically resulted in the “closing unused connections” warning instead. Similarly, I found that I had to keep manually changing the mc.cores options and that, anecdotally at least, specifying a specific number of cores rather than relying on parallel::detectCores extended the time I could have an Rstudio session working before a crash.

Also, for whatever it’s worth, when working on the previous versions of R and the brms versions for that R, I would only get Rstudio crashes when doing the post processing of models; however, I’ve noticed since updating that I will occassionally get Rstudio crashes between when rstan is finished sampling and the fitted brms object is available in the R environment. Using the file = "...file path..." argument within the brm call, it seems that those crashes occur before the object is saved as well.

1 Like

Just to echo @wgoette, I’ve been running into the same issues with similar suspicions about parallel implementations for a few months now. Making sure I call cores = 1 with things like pp_check and loo does seem to help. I thought maybe it was that my machine is a little less orthodox (dual xeon with 36/72 cores on Windows 10) or something. For example, if I do parallel::detectCores, I get 72 cores opened to start a loo, which is maybe a bit over the top. It is good to know that others are experiencing similar issues, whatever the case.

I have lately just resorted to just saving everything as .rda and restarting RStudio and loading .rdas back in after any single brms fit or loo operation. That has been the only sure-fire solution to not losing work or being disappointed with surprise crashes, it seems. I do still get some crashes for other operations utilizing parallel though. For example, a kfold run I’ve been trying to run lately, which takes a couple of days, keeps crashing at the end with messages like (I forget exactly): Error in unserialize(socklist[[n]]) or Error in serialize(data, node$con) : error writing to connection probably.

In any case, thanks for everyone’s hard work on looking into these issues. These crashes are mostly just a minor annoyance for me.

-Roy

Thanks for all these details. I think these kinds of detailed logs of issues are going to be incredibly useful for rstan devs. That’s not me, but I would say that I prototype in Mac and RStudio (it’s a decent IDE but still the buggiest thing on my Mac), then when it is going to be a big job, run it in Linux + R in the terminal. I trust RStudio about as far as I can throw a piano.

It’s a pain working in two parts like this (and the Linux terminal part means being without graphics) but not as much as putting pieces together after an RStudio crash.

I have never used brms, that’s just cause I’m a “boring old dude” (says my daughter) rather than any problem with it. I just think there should be as few intermediary layers between you and the compiled binaries for your model as possible.

On that note, I am switching more and more work from rstan to cmdstanr. It might help with memory problems too. Also, check out the memory chapter in Wickham’s Advanced R. It is not the solution to everything (or anything much) but it does clear up some issues and myths, like using gc(). Apparently that hasn’t been useful since early days / S+.

1 Like

One more data point: I’m working with R 4.0.5 / Rtools40 in RStudio on Windows and experiencing frequent but unpredictable crashes with brms that do seem to be specific to that package.

Lately I’ve been fitting a lot of models in rstanarm for a particular project (and also, more generally, lots of native Stan models via rstan) and have not encountered any crashes while sampling or post-processing. I needed to expand some binomial GLMMs to include zero-inflation, so I moved to brms. The ZIB models I’m fitting now are otherwise basically identical to the binomial models I was fitting in rstanarm, but almost immediately I started getting the dreaded bomb. It has mostly happened (albeit stochastically) when calling posterior_predict() on a brmsfit object, but a few minutes ago I got my first crash during sampling. I have not gotten any of the even-more-dreaded serialize / unserialize errors.

I habitually set mc.cores = parallel::detectCores(logical = FALSE) - 1 (which equals 3 on my machine) and there should be plenty of RAM. The data are not huge, the models are not especially complex, and the results all look fine. Nevertheless, I tried using posterior_predict(..., cores = 1) and for a while that seemed to “work” until this latest crash during sampling. I haven’t tried running chains serially, but that would be too slow to be useful in practice.

I don’t have nearly as much experience with brms as with rstanarm (although it is a pretty amazing package), so I can’t say whether this behavior is new. But at least in my case, it seems fairly clear that whatever the issue is, it’s specific to brms. So far it’s not quite unusable – I’ve been saving .RData after each fit and just hoping I can make it through this analysis – but this would definitely influence my choice of tools going forward.

Happy to provide session info if requested, though this seems like a more general problem.

I am sorry to hear that you are experiencing crashes. As you can imagine, this is very hard to debug without a reprex which is of course hard for seamingly random crashes. Did you try to update all of rstan, brms, R and Rstudio and see if that resolves it?

Oh, for sure. Given how frustrating it is debugging seemingly random issues in my own Stan-dependent packages, no doubt it’s worse when you have actual users moaning about it. ;-) I wasn’t singling out brms (a quite remarkable package!) but just hoping the observation that the issue seems to be specific rather than general to rstan or rstan-dependent packages might provide a clue. If that’s the case, for example, it’s not clear to me why parallelization and/or memory usage would differ between, say, rstanarm and brms – although as @wgoette said, I’m well outside my expertise here.

Re updating, I thought everything was current, but I see R 4.1.0 is just out. I’ll try updating and report back if that seems to help.

I’ve updated to the most recent R, Rstudio, rstan, and brms today. I do still get the crashes when post-processing. I haven’t had any crashes in compiling or sampling, though.

-----EDIT: Extra Detail-----

Just had another random crash with a call to predictive_error() which I’d tried since I had another crash after trying residuals() before that. I have an inkling that it has to do with memory and Rstudio. The model was a very simple linear regression (single outcome, 4 predictors, gaussian likelihood), but it was fit to imputed data. So, the call to brm_multiple() was for 5 imputed datasets (each n = 3046) with 3000 iterations (1000 warmup) for the standard 4 chains per model. I’ve had similar kinds of issues (I think) with versions of this same data, except that Rstudio wouldn’t crash, it would just return the cannot allocate vector of size #GB error. I’ve got around that before by subsetting the predictive_error() call, which might address the issue here. I know that Rstudio + Windows can be a little unpredictable and testy about memory usage.

Here’s my session info for reference:
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
RAM: 32 GB (31.7 GB usable)

Per Task Manager, I’m sitting at about 2% of CPU usage, 24% memory, and 1% disk

----- Additional Edit-----
Apologies for the extra edits, I clearly should’ve gathered my thoughts some before responding. I’ve been working for the last couple days with Rstudio pretty much constantly open and running brms models, including having to post-process via add_criterion(), model_weights(), and conditional_effects(). From Friday morning to Sunday morning, no crashes despite continuous running and updating of models. Today, I’m getting the random crashes after running only one to two fits on similar data but much less complex models. Possible that that’s a side effect of updating everything this morning, but I also had a change in how I was specifying mc.cores in the options. Over the continuous computing, I was specifying mc.cores directly (specifically 12, though my PC has 16 available cores), and I would switch to just a single core for all the post-processing and then up it back to 12 for the model sampling. Today, however, I’ve just been specifying the standard options(mc.cores = parallel::detectCores()) at the start of things and then running through the rest of my script. Not sure that that’s the cause per se, but just something else that occurred to me