Creating an R package using brms

I’m creating my second real R package, the first that uses any Stan functionality. For now, it’s proprietary, so I can’t upload it to GitHub.

I’m running R 3.6.1 on Windows 10, and all my packages are current as of yesterday. In particular, I’m running brms 2.9.0. My normal IDE is ESS.

Since the last time I did a package, usethis has come into the picture, so I’m trying to learn somewhat new processes, build on what I knew, and refer to Hadley’s book as much as I can.

My immediate question has to do with a model implemented in brms. There are three ways that model is to be used:

  1. I want to save a fitted model in the package and then use it as the default model for prediction.
  2. I want to allow the user to fit the model to new data and save the fitted model for later reuse.
  3. I want to allow the user to use the model to predict outcomes in an optimization process using DEoptim or RcppDE.

I discovered “Guidelines for developers of R packages interfacing with Stan” today, which leads to my first question:

  • Should I simply run stan_package_skeleton over top of my fledgling package? I see that usethis::create_package says it “can be called on an existing project”; I haven’t found any such statement in the stan_package_skeleton documentation yet.

I also read “Step by step guide for creating a package that depends on RStan.” It seems focused on Stan models and Stan source. My second question:

  • Is there corresponding information that points to where and how I deal with a brms-based model?

Given my three model uses,

  • First, is there a preferred way to store the fitted model as part of the package and recall it later? I know a bit about the data() function; I sorta want the same thing for a brmsfit object, I think.
  • I think the second use is simple: if I export the function that fits the model, the user can fit the model to whatever data is appropriate, and the brm() file argument can say where it should be stored.
  • I think the third use case is similar to the first. It sounds easy to predict values from an existing model if I can pass the model to the function, which seems straightforward in the second case. I don’t think I know how to pass the default model (use case 1) properly.
  • Is there an existing, perhaps not too complex package, that does much of what I’ve described so that I can look at it?

I welcome any and all insights; I realize I’m at the front of this learning curve.

I would say a viable process is approximately:

  1. Use brms::make_stancode to print the Stan program to the screen
  2. Copy and paste that to a .stan file
  3. Follow the vignettes in the rstantools package (however, it is best to use the GitHub version rather than the CRAN version at the moment)
  4. Write a R function that inputs stuff, calls the generated C++ code, and outputs stuff. In principle, you could just call brms::make_data but you can probably provide a more intuitive interface for your particular model.

Thanks, Ben. I figured that might be the case; I was just curious if there was something like save_brmsmodel and brmsmodel matching save and data for package data. Would that be a useful enhancement at some time?

Any insights into whether stan_package_skeleton is safe to run on an existing package? If not, I’ll see if I can figure it out from reading that function definition.

Saving a compiled model does not work well in the sense that the saved objects can rarely be loaded outside the computer they were created on. Thus, that route is not suitable for a package, unless you are only using it yourself on one machine in which case it is probably easier to just use some structured R scripts rather than a full package. In general, such a package needs to be compiled on each machine where it is to be installed.

The rstan_package.skeleton is gone from the GitHub version and would not work on an existing package anyway. It is now rstan_create_package to create a new package or use_rstan to add Stan functionality to an existing package, but you may just need to call rstan_create_package in a new directory and copy some of your previous files into it.

I’m following your advice to use some structured R scripts (probably embedded in an Org Mode file).

As for saving a brmsfit object in some fashion, I was thinking it might make use of (a modified version of) the file argument in brm(): if a compiled model exists that was built on this machine (I was thinking “this architecture,” but I can see how that could be insufficient), then load it. Otherwise, compile and save the file.

That would solve my problem of wanting a brmsfit object on which to run predict(), and it would be relatively efficient: on each new machine on which it was run, it would only get compiled one time.

I can see that might cost more than it’s worth, but I do see value in being able to create a Stan model that’s useful for predictions in a production application. Maybe I’m just thinking about this the wrong way. At any rate, I’m good for now.