RAM keep increasing until crash when run many brms/Stan models in parallel based on futures

In my project, I plan to run thousands of bayesian multilevel model using brm package, which means it’s time-consuming. So I try to use furrr package to run these models in parallel. Everything is good, but the RAM keep increasing until R crash. As shown in the attached figure, the memory usage ramp up over the night from Friday to Saturday, then max out at ~250 GB (insane amount of RAM) and crashed.
pastedImage

I already used rm(fit) and gc() command after I save the model results. I also tried use plan(callr, workers = parallel::detectCores()-1) instead of plan(multisession, workers = parallel::detectCores()-1). But it does not make a big difference.

I am wondering do you have any experience about run brms in parallel efficiently and quickly, and how to solve the memory problem.

Really looking forward to your reply.

Hi, welcome to Stan discourse. It looks like some kind of memory leak, but it is a bit difficult to answer your question without seeing details of the code you ran. Can you share that as well?

Here is details of code:

Multilevel Model

rm(list=ls())

Load Packages

library(tidyverse)
library(dplyr)
library(rstan)
library(lme4)
library(brms)
library(tidybayes)
library(parallel)
library(furrr)

Set working directory

setwd(“…/data”)
data_path ← “…/Analysis”

Load data

Create index

index ← crossing(
Event = c(unique(nested.psw_small$Event),“none”), #Event_length=15
Trait = unique(bfi_wide$Trait), # Trait_length=5
RE = c(“int”, “intslope”), # RE_length=2
Match = c(“matched”, “unmatched”)) %>% # match_length = 2
filter(!(RE == “int” & Event == “none”) &
!(Match == “matched” & Event == “none”) &
!(RE == “int” & Match == “unmatched”))

Create multilevel model function

growth_fun ← function(event, trait, re, match){

lapply(1:10, function(x){

rstan_options(auto_write = TRUE)

## set prior
Prior <-  c(set_prior("cauchy(0,1)", class = "sd"), set_prior("cauchy(0,1)", class = "sigma"))

# get formula
if(re == "intslope" & event != "none"){
  f <- value ~ new.wave*le.group + (new.wave|PROC_SID)
}else if (re == "int" & match == "matched"){
  f <- value ~ new.wave*le.group + (1|PROC_SID)
}else{
  f <- value ~ new.wave + (new.wave|PROC_SID)
}

## set specifications for models with convergence issues
if((trait == "A" & event %in% c("ChldMvOut", "MoveIn", "ChldBrth")) |
   (trait == "E" & event %in% c("ParDied")) |
   (trait == "C" & event %in% c("Retire")) |
   (trait == "O" & event %in% c("Unemploy")) |
   event %in% c("PartDied", "FrstJob", "Divorce", "LeftPar")){
  Iter <- 8000; Warmup <- 4000; treedepth <- 20
} else {Iter <- 2000; Warmup <- 1000; treedepth <- 10}


## run models
fit <- brm(formula = f,  data = df,  prior = Prior, iter = Iter, warmup = Warmup,
           control = list(adapt_delta = 0.99, max_treedepth = treedepth))

file <- sprintf("%s/data/Results_data/Test/%s_%s_%s_chain%s_%s.RData", data_path, trait, event, match, x, re)
save(fit, file = file)
rm(fit)
gc()

})
}

Run multilevel model

plan(multisession, workers = parallel::detectCores()-1, gc = TRUE)

index %>%
mutate(mod = future_pmap(list(Event, Trait, RE, Match), growth_fun))

I don’t see immediately where the memory leak is. Perhaps future does not clean up well after itself? What happens if you remove f as well? Formulas store their environements and perhaps this is what keeps cluttering the RAM (although it should not happen in this case). Further things to try out: Save brmsfit objects not via save but via the built-in file argument. See if using backend = "cmdstanr" makes things better. To get most of the latter, you may want to use the latest github version of brms.

@paul.buerkner
Hi, thanks for your reply.

I tried everything you mentioned above.

  1. I removed everything I could, rm(df, sub, f, fit, file), but the memory leak still existed.
  2. I tried to use built-in file argument to replace save in the outside of brm, but also did not solve the memory leak problem. I even got an error message after running code successfully several hours, grep: write error: No space left on device, but there was enough space in my homedirectory.
  3. using backend = "cmdstanr" did make things better, I got better performance in speed, but not memory.

Considering about perhaps future does not clean up well after itself, so I tried to use foreach package to replace future package, it seems the RAM is not so insane as before, but new problem has arisen, foreach provides no speed up with 48 cores (。•́︿•̀。)

Have not solved the problem.

I am sorry to hear that. I am regularly using foreach to parallelise brms models so this should generally work. without more details I am not sure I can give you any more input right now.

@paul.buerkner
Thank you for the quick reply.

I totally agree with you that the foreach should work, because I also try same code (using foreach to parallelise brms models) in my own MacBook Pro (8 cores), it works well and does speed up. But when I use it in the Virtual Machine RStudio Sever (linux system, 48 cores, 252GB ), then it does not speed up any more. So weired…

By the way, when you use foreach to parallelise brms models, does your computer CPU/RAM keep stable?

Just a quick question because I had that problem when running many models in paralell: Can you tell if the problem happens during compilation or sampling (from the timeline it looks like compilation should be done but maybe not?) In my case precompiling simple versions of models and then updating them again and again helped with that.

Other than that, I also have the problem that parallelization via doParallel and foreach does not work on my cluster setup for some reason (same speed as 1 core) and at least when working in Rstudio I haven never seen rm() + gc() reduce the ram used I think. So far I was too lazy to figure out why as I could just restart Rstudio but would be curious for a reason/solution.

@Yunrui Are you, by any chance, running the code on a linux cluster using the intel compiler / mkl blas libraries?

yes, I am using linux VM to run my model.

When the parallel does not work (same speed as 1 core), I suggest you to check your code, it’s very important to set x (1:n) in a right way

We are investigating a problem where it seems that on certain linux cluster setups cmdstanr can’t be parallelized. And one possible problem could be the use of either the intel compiler or mkl blas libraries, thus the question.

@paul.buerkner
Because R apparently does not release memory automatically ( R FAQ) and furrr is a super RAM eater, so I solved my problem by separate the whole big task into several chunks, every sub-task can be finished within the memory limitation.

Thank again for all your help.

I noticed that, so I didn’t use backend = “cmdstanr” finally