R Wrapper for CmdStan

Following up with @Bob_Carpenter on a discussion during the Stan meeting, where the idea of a lightweight wrapper in R for cmdStan came up. I’m sharing some work @billg and I did a while ago.

The attached file contains functions to run cmdStan from R, using the system function. I think it’s straightforward, but I can share examples if needed. If there is enough interest, I can post this on a stand-alone GitHub repo. cmdStanTools.R (2.2 KB)

2 Likes

Are these running commands on “raw” terminal or with a special R library? (Like in Python there are subprocess which is preferred tool for cmd stuff and then raw os.system)

It has been on my ToDo list for ~2 years (approx StanCon 2016) to make a lightweight R interface to Stan using only base R (see https://github.com/duncantl/rstanapi). Sometimes the bells and whistles of rstan are nice, but the heavy dependency debt makes using rstan difficult to use in some environments. For example, on a fresh linux image, installing rstan can take 30+ min. I like the approach of stanis, but working with the config files can be clunky.

I was never sure if there was a real desire for this kind of thing, which is why it has sat for so long.

@ahartikainen I’m not sure. I’m guessing “raw” terminal.

@mespe I use this tool to work with development versions of Stan, while still using R as a scripting language. When using a released version of Stan, I’ve been quiet happy with RStan. I think @Bob_Carpenter has a better sense of how useful to the community such a wrapper would be. I’ll take a look at your code when I get a chance.

Currently Stannis also takes lists as arguments… what sort of interface do you think would be ideal? It would be straightforward to reproduce something similar to rstan for the control arguments (has anybody written a spec for Stan’s flat argument structure yet?) I’ve put some time into that code and I use it almost daily so I’d like to see some broader use.

I think what we were experimenting with was slightly different than what either these functions or Stannis is doing - rather than wrapping cmdstan in R, we were attempting to mimic it using base R functions. In our code, there would be no system() calls to the cmdstan executables.

The idea was to get an interface with the benefits of both cmdstan (simple, stable) and rstan (data and results directly in R without intermediate rdump/csv files).

I also use Rstan and am generally happy, but for some applications it is overly complex. I hate the massive dependency debt, and debugging issues takes some additional steps due to its reliance on Rcpp. In theory, it should be possible to build an R interface to Stan that only requires the C++ libraries, which is what we were experimenting with.

Ok, I see that after looking at your repo.

You might as well wait for the base class to be added to Stan b/c then this will be very straightforward (even with Rcpp).

I’ve written an R wrapper for CmdStan (github.com/sakrejda/stannis) that is generally reliable and I think the only idiosyncratic thing about it is that it runs on configurations specified in .yaml files. Those config files are immediately turned into R lists so there’s no barrier in making the main interface more standard.

The current features are:

  • Does error checking on the inputs when it constructs the CmdStan call
  • Does error checking to capture when file system operations don’t come out as expected.
  • Handles doing a local install of CmdStan
  • Recompiles models as-needed
  • Runs parallel chains using parallel::clusterMap
  • Captures all the output to separate files in a per-chain directory
  • Records all the metadata about each run in plain text files so it doesn’t have to be parsed out from CmdStan output.

The main experimental feature is translating CmdStan output into a binary representation and making it accessible form R parameter-by-parameter without having to deal with a .csv file. That’s something that should probably become an option for CmdStan instead.

I suggest using a modified version of my package to fill this niche, obv. after a discussion about what sort of interface is preferable and whether some of the features should be removed/simplified. (cc @Bob_Carpenter since this is a TWG sort of area).

re: @ahartikainen: in R the preferred tool is the ‘system2’ command. It’s not as comprehensive as subprocess but it works fine.

2 Likes

I had thought this is what rstanarm provides, but I suppose we would get more than just the C++ libraries with rstan.

And I’m assuming also that rstantools must not provide CmdStan bindings? Why wouldn’t Stannis be folded in there?

As far as wrappers for cmdstan go, I think Stannis does a good job generally. I looked at using it for a project for a client, but decided not to because of the .yaml config files. Adding an R function to handle the config would help a lot with novice users.

Related: I wrote a small R package for pre-compiling CmdStan models inside other R packages which use cmdstanr: {instantiate}: pre-compiled CmdStan models in R packages, Pre-Compiled CmdStan Models in R Packages • instantiate