Cmdstanr & sourceCpp / cppfunction for exposing functions

Hi!

I was wondering if there has been some activity to get expose_functions working with cmdstanr? A possible way to get this going (I think) could be via the sourceCpp or cppFunction functions from Rcpp. Has this been considered yet? Is it an option to do that (given that Rcpp has a GPL license)?

I am going to try to get some custom C++ Stan-math based function working from R. Looking at the documentation it looks feasible, but I haven’t yet done anything other than looking at the doc.

Or did someone already work this out?

Sebastian

This was my most recent attempt: https://gist.github.com/rok-cesnovar/c91d631b66c1124ae7f221155f87516f

It worked fine for me on Linux, I am almost sure it doesn’t work on Windows (yet).

It requires Cmdstan 2.25 or rather stanc3 for 2.25 (that one supports standalone functions).

It should also restore flags it sets to the previous state. The intention was just to show it can be done so there might be more user-facing holes.

I also took a bunch of tricks from Rstan so that probably makes this code GPL licensed (we possibly could get the authors of those lines to agree to give that code under a different license, but I am a newbie when it comes to licensing).

The dependency on Rcpp on its own should not make it GPL as far as I understand things.

Whow! Cool beans.

Windows will require some more care with PATH problems.

I will try this… though I am interested in a pure C++ version as I want to make a call to the Jacobian function in Stan-math and get the results.

In that case, it should only get easier, no need for the stanc3 stuff and all the handling of the std::ostream* stuff.

  • Anything above line 20 is irrelevant
  • Put your C++ code in the code variable in line 20
  • most of the code from 28 to 110 is also irrelevant (those are just to make the the ostream and rng args work with rcpp)
  • lines 116-118, 147-167 also not needed

Thanks. The code now runs and works under macOS with this small addition:

  tbb_dll_ext <- switch(Sys.info()["sysname"],
                        Linux = "so.2",
                        Darwin = "dylib")

and somehow the stream magic did not work, but defining and calling the function “manually” made things work.

What are the plans for this thing? The expose function thing is really handy for stan models. So having this in cmdstanr as a permanent feature would be very amazing. This should also open the door to having get_log_prob and the like…

The first step was to get this implementation working with a minimal example. You confirming it works on a Mac with the .dylib fix is a nice step forward. I am exploring some other options like cpp11 in order to have all the info before we move forward.

I think we want to support both expose_stan_function as well as log_prob.

I also have a stanc3 early-stage prototype for expose_stan_functions that also returns gradients. But haven’t moved that much in the last month. Also not sure how much of a need is there for that.

The biggest question to me for this way of including these features is how can we use Rstan’s tricks with RCpp without being forced to GPL which is obviously a deal-breaker.

There is also another option of implementing this in cmdstan and supporting
model log_prob data file=data.json
and
model function name="foo" data file="data.json"

This would have a higher overhead obviously but as far as I understand this is mostly used for debugging/exploring.

I am leaning more towards Rcpp/cpp11 but really want to make sure we avoid the rstan issues (which I think we do by using cmdstan source files but really want this to be as painless as possible).

1 Like

echoing what Rok says, this is going to take CmdStanR far beyond lightweight wrapper
and it’s going to drag CmdStanR into the RCpp swamp and become a maintenance nightmare.

we desperately need an R interface to Stan that most users can install and use; we need it to be stable and well-documented. we don’t have a ton of developers. Rok is doing a heroic job adding features while putting out fires on a daily basis.

adding this to CmdStanR serves the needs of a very small set of users and will suck up time and resources. is it worth it? I say no.

3 Likes

Mitzi’s reply brings me to the third option I didn’t mention previously - a separate package with these kind of rcpp/cpp11 stuff: (cmd)stan_functions or something like that.

Adding expose and log_prob would not cause cmdstanr to be harder to install as these features would only be used by those that want it, but yeah Rcpp/handling flags is a time sinker for sure.

4 Likes

If you’re considering a separate package, what would be advantages over using rstan?

I have been using cmdstanr for fitting (when chain threading needed or want to use newer syntax), and rstan to expose functions. One obvious drawback to this is that rstan is behind in compiling the latest stan syntax.

This would just be extending cmdstanr with a few features (if we go this route)

Some advantages are:

  • new releases are available instantly
  • we dont have to rely on rcppeigen, rcpparallel, etc for other dependencies. Only rcpp or cpp11 which makes this muuuuch simpler to use/install. Rcpp on its own was never really the issue.
  • sources to build with are not part of another r package (stan headers)
1 Like

I think that’s right. Calling a function from a GPL package shouldn’t force us to be GPL, but if we included source code from a GPL package then that would. I’m not saying we should definitely depend on Rcpp, but I don’t think the license would be the reason not to do it.

I think log_prob serves a small set of users, but expose_stan_functions is really essential for testing user defined functions and has a much larger set of users (and I’d argue it should have more users than it probably does). So I think it’s important to eventually offer that in some way, but I 100% agree that we need to make sure to avoid the problems that RStan has run into.

Yeah I think what you’re currently doing (using both packages) makes sense for now. But basically for all the reasons @rok_cesnovar said it would be good to offer this outside of RStan (unless we can somehow solve all of the RStan problems).

I woud concur that thex expose functionality is quite essential. To @mitzimorris concerns: It’s correct that the expose functionality was a long time a big headache to maintain. That was mostly due to the rather unconventional way of how it was implemented by post-processing stan model cpp files. With the parser now having an explicit mode to create cpp files intended for exporting the functions this has been made much more robust.

We absolutely want to maintain the upside of cmdstanr. To me this is ease of installation - and this is not going away when we have the expose functionality. Moreover, the expose functionality is optional and not required to get Stan models to do inference . The big thing, though, which is added in terms of dependencies here is that a working toolchain will be required in R. And this is something to decide on if that is ok. A possible option could also be to put this functionality which glues together stan and R so tightly (and requires a working toolchain) into another R package cmdstanr-tools, for example. I am not totally sure, but in any case, the expose stuff is very useful. I am about to use it to test out different CVODES tolerance settings and here this tight integration is really valuable.

4 Likes

You said there is a new version of this expose thing available as prototype? Is that online already?

Not yet, will put it on a repository (these gists are not really great for versioning) over the weekend, want to test Windows behaviour first. Its mostly just cleanup, OS support and making sure the users env variables remain as they were. And removed the requirement for rcppeigen.

2 Likes

I noticed that the stan-math library dependencies are hard coded into the function, which caused a hickup for me as sundials got upgraded. How about you extract the CXXFLAGS out of cmdstan with

make print-CXXFLAGS

?

Then things would be build with exactly the same setup as cmdstan. As alternative you can extract the library info with make print CXXFLAGS_SUNDIALS etc.

BTW, microbenchmarking seems to work from R and that is really cool!

2 Likes

Oh, that is a good idea.

The harcoded paths were part of the refactor I did, I just did it with listing folders inside the lib folder and then working based on what that found. But the print-CXXFLAGS is a great idea! Thanks, should have today afternoon to publish this.

Used this for some OpenCL stuff as well. It beats timing in C++ any day :)

1 Like