I’m creating my second real R package, the first that uses any Stan functionality. For now, it’s proprietary, so I can’t upload it to GitHub.
I’m running R 3.6.1 on Windows 10, and all my packages are current as of yesterday. In particular, I’m running brms 2.9.0. My normal IDE is ESS.
Since the last time I did a package, usethis
has come into the picture, so I’m trying to learn somewhat new processes, build on what I knew, and refer to Hadley’s book as much as I can.
My immediate question has to do with a model implemented in brms
. There are three ways that model is to be used:
- I want to save a fitted model in the package and then use it as the default model for prediction.
- I want to allow the user to fit the model to new data and save the fitted model for later reuse.
- I want to allow the user to use the model to predict outcomes in an optimization process using
DEoptim
or RcppDE
.
I discovered “Guidelines for developers of R packages interfacing with Stan” today, which leads to my first question:
- Should I simply run
stan_package_skeleton
over top of my fledgling package? I see that usethis::create_package
says it “can be called on an existing project”; I haven’t found any such statement in the stan_package_skeleton
documentation yet.
I also read “Step by step guide for creating a package that depends on RStan.” It seems focused on Stan models and Stan source. My second question:
- Is there corresponding information that points to where and how I deal with a
brms
-based model?
Given my three model uses,
- First, is there a preferred way to store the fitted model as part of the package and recall it later? I know a bit about the
data()
function; I sorta want the same thing for a brmsfit
object, I think.
- I think the second use is simple: if I export the function that fits the model, the user can fit the model to whatever data is appropriate, and the
brm()
file
argument can say where it should be stored.
- I think the third use case is similar to the first. It sounds easy to predict values from an existing model if I can pass the model to the function, which seems straightforward in the second case. I don’t think I know how to pass the default model (use case 1) properly.
- Is there an existing, perhaps not too complex package, that does much of what I’ve described so that I can look at it?
I welcome any and all insights; I realize I’m at the front of this learning curve.
I would say a viable process is approximately:
- Use
brms::make_stancode
to print the Stan program to the screen
- Copy and paste that to a .stan file
- Follow the vignettes in the rstantools package (however, it is best to use the GitHub version rather than the CRAN version at the moment)
- Write a R function that inputs stuff, calls the generated C++ code, and outputs stuff. In principle, you could just call
brms::make_data
but you can probably provide a more intuitive interface for your particular model.
Thanks, Ben. I figured that might be the case; I was just curious if there was something like save_brmsmodel
and brmsmodel
matching save
and data
for package data. Would that be a useful enhancement at some time?
Any insights into whether stan_package_skeleton
is safe to run on an existing package? If not, I’ll see if I can figure it out from reading that function definition.
Saving a compiled model does not work well in the sense that the saved objects can rarely be loaded outside the computer they were created on. Thus, that route is not suitable for a package, unless you are only using it yourself on one machine in which case it is probably easier to just use some structured R scripts rather than a full package. In general, such a package needs to be compiled on each machine where it is to be installed.
The rstan_package.skeleton
is gone from the GitHub version and would not work on an existing package anyway. It is now rstan_create_package
to create a new package or use_rstan
to add Stan functionality to an existing package, but you may just need to call rstan_create_package
in a new directory and copy some of your previous files into it.
I’m following your advice to use some structured R scripts (probably embedded in an Org Mode file).
As for saving a brmsfit object in some fashion, I was thinking it might make use of (a modified version of) the file
argument in brm()
: if a compiled model exists that was built on this machine (I was thinking “this architecture,” but I can see how that could be insufficient), then load it. Otherwise, compile and save the file.
That would solve my problem of wanting a brmsfit object on which to run predict()
, and it would be relatively efficient: on each new machine on which it was run, it would only get compiled one time.
I can see that might cost more than it’s worth, but I do see value in being able to create a Stan model that’s useful for predictions in a production application. Maybe I’m just thinking about this the wrong way. At any rate, I’m good for now.