I used rstantools::rstan_package_skeleton to start a new R package based on Rstan. I would like to include some data simulating functions I wrote in stan and which I can use in R through expose_stan_functions. Is there a recommended way to do that?
Currently I write an R function like so
my_r_function = function(params){
rstan::expose_stan_functions(stanmodels$simulate)
mat = my_stan_rng(params)
return(as.data.frame(mat))
}
I understand that this is not a typical use case for (r)stan so feel free to ignore my question. I am already pleased with the 4x speedup compared to my pure R data generating function.
there are solutions to this on the forum as I recall. The key idea is to use the caching facility of Rcpp’s sourceCpp function. This will allow you to fish out the c++ file you are looking for.
I settled on adding the following bit of code in the automatically generated stanmodels.R before rm(MODELS_HOME). I also added an extra folder src/stan/stan_functions.
The code is very much modelled after what is already in stanmodels.R. That is, it loops over all files in the new folder, exposes the functions in a separate subfolder for each file, and cleans up any old .cpp (hence the separate folders). Finally, compileAttributes gathers and links all the functions.
If I figure out how stanmodels.R gets called at installation, I make a separate file with this bit of code to keep the original file pristine.
You can automate this through the use of the cleanup script or you dig into src/Makevars (I think). How these are used is best taken from the rstanarm package.
I think a general solution is still not a solved issue. Which is fine to be honest. I don’t expect stan to have a general purpose stan language to c++ to R wrapper functionality.
See the github discussion
I made it work for my purpose, simulating data, by rewriting the function a little bit, adding a generated quantities section to generate the data and add a data section to allow for variable inputs into the data generation. My R function is than a wrapper around rstan::sampling(..., algorithm = "Fixed_param"). This approach has two advantages.
The data section does the input checking for me.
It’s actually 15% faster than the original functions.
It’s closer to prior predictive checking which I will use at some stage anyway.
The disadvantage is the output from rstan::sampling. I couldn’t squash everything so I used sink().
From my experience is the fixed_param approach not nice for issues with the output shapes and most importantly it’s slow speed when you have lots of output. Maybe that changed in the meantime.
However, you should look forward to rstan 2.18.0 which includes a complete rewrite of the expose facility as I recall.
For the record I encountered a similar problem where I had a sub directory including multiple files that include multiple functions I solved it as follows (code included in stanmodels.R):
Automatically exposing Stan functions in a package is already supported in rstantools. Any .stanfunctions files in the package’s inst/stan directory will have the included functions compiled and exported in the package. This is currently being used by the lgpr and rmdcev packages
Looking at things it appears to me that my function would still be useful. The thing is that the stream and rng arguments need special attention. The function I wrote will automatically setup the formals of these functions by rewriting the RcppExports. This is a bit of a hack, but it seems to work very reliable and integrates with the overall framework. The difficulty with the stream and rng would be good to solve more principled (like having a package internal single instance of the rng) - maybe that could be included in rstantools…and currently the rstantools create package function does not accept .stanfunctions files. I am also not sure where these things are documented; at least I was not able to find them quickly.
Just a few points for consideration (great work in any case!).
Thanks. That’s already been an extremely helpful pointer, but I’m failing to implement this on the last stretch. However, it seems that the problem might not actually be with my code, because even when I do a clean install of rmdcev and try to use one of the functions, for example rmdcev:::CalcAltOrder(1, 1), I get an Error: Expecting an external pointer: [type=integer]., which is the same error I get for my own package. The exact same thing also happens with lgpr:::STAN_var_mask(1:3, 1), for example.
The problem is not with Rcpp as such, because when I create a test package with Rcpp::Rcpp.package.skeleton and add a simple function with an argument to its rcpp_hello_world.cpp file, I can call that just fine and it works. That’s why I suspect the problem is somehow caused by how the functions from Stan are written in the rstantools .cpp file?
This is running R 4.3.3, rstantools 2.4.0, rmdcev 1.2.6, Rcpp 1.0.12