Reducing brms model output size for predictive function


#1

Hi there,

I want to use brms model fits within an R package function I am writing so that users can get predicted values with their new data. Each model fit object is, however, ~14-15MB and with a total of 6 models this makes the sysdata.R file very large (~52MB when compressed).

Can anyone provide any advice about how I could reduce the file size of each model object whilst retaining its ability to predict with new data?

Thanks in advance!

Liam

Please also provide the following information in addition to your question:

  • Operating System: Mac OS X High Sierra/Windows 10
  • brms Version: 2.3.6

#2

For what purpose exactly do you want to store the brms models in an R package?

Asked differently, what is the scope of your R package?


#3

The purpose of the R package is to provide predictive models for estimating pollinating insects’ body size. The function we have built provides three model options, depending on the user’s data/hypotheses, for prediction in two pollinating taxa, bees or hoverflies (so total of 6 brms fits). I want to use the brms fits within the function so it returns estimates as well as the S.E. and 95% CI’s.


#4

You can set save_dso = FALSE when fitting your model. The rest of the size are mainly the posterior samples so you can basically try to store less posterior samples at the expense of potentially estimating a little bit less accurate.


#5

Thanks for the tips @paul.buerkner, I did think I may be asking too much but I will give save_dso =FALSE a try and see how it goes to begin with.


#6

If your models are fixed in terms of the Stan code they generate, you can use brms to generate the Stan model, then deploy them with the R skeleton. See the vignette in rstantools.

The advantage is that users will get a pure binary install and won’t need to find a C++ toolchain and install all of RStan.


#7

Hi @Bob_Carpenter thank you for the suggestion and the helpful link - will this require that the function runs the model internally as opposed to loading the model object?
Perhaps a better question is will a stan model object be significantly smaller than a brms fit object? Based off the system time for fitting the models, running it internally within the function isn’t feasible.


#8

I’m not sure how it works at the technical level. @bgoodri and @jonah will.

From a users perspective, all the models are compiled and downloaded in binary form with the package so that the user doesn’t need a C++ toolchain.

I don’t know what’s in a brms fit object, either. The RStan fit object contains a bunch of stuff you probably won’t need for inference—you can probably get away with just pulling out the draws for parameters you care about and save those. It does let you specify the parameters to save.

If the fit objects are too big, how many draws are they? Usually you don’t need a super large n_eff for inference and we often see people running 10s or 100s of thousands of iterations. So reducing number of draws is another way to economize on fit size.


#9

I don’t think precompiling the stan model will solve the problem here, since the aim seems to be to have pre-fit models just to run post-processing on them. Precompiling would just help with fitting the models but won’t reduce size of the model itself.