Cannot save brms object in R for file size reasons

llewmills · January 29, 2024, 2:11am

Just a disclaimer that I posted a more or less identical question on stack overflow.

I ran a Bayesian regression on a large, messy health dataset recently, a longitudinal mixed effects model with random slopes and nested intercepts (encounters within clients). As a result I have an R object that is comfortably larger than anything I have used before: 7722886880 bytes (7.7 gig!)

When I tried to save this object as an RData file it took a long time to save and then failed, with the error

Error in gzfile(file, "wb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "wb") :
  cannot open compressed file 'fooFolder/fooSubFolder/foo.RData', probable reason 'Operation canceled'

Which I assume means ‘file’s too big holmes’. Does anyone know a way to allow me to save the object? Maybe some options() command that allows me to override the time out? It took a long time to run the model and k-fold validation. I’d rather not do it all again. I figure some people on this forum may have encountered similar problems.

wpetry · January 30, 2024, 3:34am

Start with two quick checks that are a bit pedantic, but will save you additional suffering if they are the culprit:

Make sure you have enough disk space to save a large file. The file size on disk, and thus free space required, is likely to be somewhat larger than the byte count object.size() reports. It’s hard to say exactly how much, but for 7.7 GB in memory, maybe you want at least 15 GB of free disk space?
Make sure you have specified just one object in save(). I can’t count the number of times I have waited ages for a small object to save only to realize that I was inadvertently saving all the objects in my workspace.

Assuming these weren’t the cause, there are still a couple of ideas to try:

Use a different compression algorithm via the compress argument. This is more likely to offer a size on disk improvement rather than a save time speed up. I get the smallest file sizes using xz.
Switch to use the qs package, which offers a much faster and highly tunable serialization format.
Step back and consider whether you need to save the whole object. Do you need all of the diagnostics, initial values, etc. or could you make do with just the posterior draws?

matti · January 31, 2024, 8:02am

Don’t save (some of the) group-level parameters: Control Saving of Parameter Draws — save_pars • brms

harrelfe · February 2, 2024, 1:26pm

In the R rmsb package blrm function I go to a lot of trouble to save lean fit objects and to reference such saved files when determining if anything has changed that makes us need to run the sampling again. You might profit from looking at the code on CRAN or GitHub.com/harrelfe. I make sure that no environments are carried along, say when storing functions created during model fitting. These R functions carry along multi-go environments that are not needed. I store them as character strings in the fit object to avoid that.

Topic		Replies	Views
Reducing large model file size brms	7	992	July 31, 2021
Reducing brms model output size for predictive function brms	8	1270	July 20, 2018
Brm model finishes fitting but does not save to R object brms	1	1109	January 17, 2020
Brms limited memory issue while running on 15M data points Modeling brms	9	1025	July 28, 2021
Brms cannot save the model data brms example-models	1	951	July 6, 2022

Cannot save brms object in R for file size reasons

Related topics