Using expose_stan_functions in a package

stijn · October 11, 2018, 11:31am

I used rstantools::rstan_package_skeleton to start a new R package based on Rstan. I would like to include some data simulating functions I wrote in stan and which I can use in R through expose_stan_functions. Is there a recommended way to do that?

Currently I write an R function like so

my_r_function = function(params){
  rstan::expose_stan_functions(stanmodels$simulate)
  mat = my_stan_rng(params)
  return(as.data.frame(mat))
}

I understand that this is not a typical use case for (r)stan so feel free to ignore my question. I am already pleased with the 4x speedup compared to my pure R data generating function.

wds15 · October 11, 2018, 12:43pm

there are solutions to this on the forum as I recall. The key idea is to use the caching facility of Rcpp’s sourceCpp function. This will allow you to fish out the c++ file you are looking for.

stijn · October 12, 2018, 6:19am

Based on this thread

and this one

and this very sage advice in a third thread

I settled on adding the following bit of code in the automatically generated stanmodels.R before rm(MODELS_HOME). I also added an extra folder src/stan/stan_functions.

# Start Addition Stijn
stan_function_files <-dir(file.path(MODELS_HOME, "stan_functions"),
                      pattern = "stan$", full.names = TRUE)
lapply(stan_function_files, function(f) {
  file_name <- sub("\\.stan$", "", basename(f))
  stan_model <- rstan::stanc(f, allow_undefined = TRUE,
                             obfuscate_model_name = FALSE)
  rstan::expose_stan_functions(stan_model,
                               cacheDir = file.path(MODELS_HOME,
                                                    "stan_functions",
                                                    file_name),
                               cleanupCacheDir = TRUE)
  }
)
Rcpp::compileAttributes()
# End Addition Stijn

The code is very much modelled after what is already in stanmodels.R. That is, it loops over all files in the new folder, exposes the functions in a separate subfolder for each file, and cleans up any old .cpp (hence the separate folders). Finally, compileAttributes gathers and links all the functions.

If I figure out how stanmodels.R gets called at installation, I make a separate file with this bit of code to keep the original file pristine.

wds15 · October 12, 2018, 8:36am

Looks good.

You can automate this through the use of the cleanup script or you dig into src/Makevars (I think). How these are used is best taken from the rstanarm package.

stijn · October 12, 2018, 10:38am

This isn’t the neat solution I thought it was.

stijn · October 13, 2018, 6:33am

I think a general solution is still not a solved issue. Which is fine to be honest. I don’t expect stan to have a general purpose stan language to c++ to R wrapper functionality.

See the github discussion

I made it work for my purpose, simulating data, by rewriting the function a little bit, adding a generated quantities section to generate the data and add a data section to allow for variable inputs into the data generation. My R function is than a wrapper around rstan::sampling(..., algorithm = "Fixed_param"). This approach has two advantages.

The data section does the input checking for me.
It’s actually 15% faster than the original functions.
It’s closer to prior predictive checking which I will use at some stage anyway.

The disadvantage is the output from rstan::sampling. I couldn’t squash everything so I used sink().

wds15 · October 13, 2018, 3:33pm

From my experience is the fixed_param approach not nice for issues with the output shapes and most importantly it’s slow speed when you have lots of output. Maybe that changed in the meantime.

However, you should look forward to rstan 2.18.0 which includes a complete rewrite of the expose facility as I recall.

bartk · January 11, 2019, 1:33pm

For the record I encountered a similar problem where I had a sub directory including multiple files that include multiple functions I solved it as follows (code included in stanmodels.R):

code<-paste(sep='\n','functions\n{',
            paste(collapse = '\n',
                  unlist(lapply(dir(file.path(MODELS_HOME, "stan_files",'functions'),
                                    pattern = "stan$", full.names = TRUE),
                                readLines))),
            '}')
a<-rstan::expose_stan_functions(env=environment(),
                                rstan::stanc(obfuscate_model_name = F,
                                             model_name = "includedFuns",
                                             model_code = code))

wds15 · January 22, 2024, 3:56pm

Here is my solution to this problem:

github.com/stan-dev/rstantools

automatically exposing functions from Stan to R

opened 03:55PM - 22 Jan 24 UTC

wds15

As I rely by now on a ton of Stan functions and I do regularly expose these to R…... which involves quite often that I have to compile the C++ code in every new R session. The `sourceCpp` thing from `Rcpp` can be made to always use the same cache directory, but also that is an approach which has flaws (concurrent access to this from parallel running processes). Here is now another approach which could possibly be include in `rstantools`: The idea is to wrap the functions to be exposed into a Stan model. I noticed that `rstantools` does already a lot to expose these functions via Rcpp attributes to R, but not all pitfalls are handled... which is why I hacked up the function below, which may serve as a template for a new feature in rstantools: ```r #' Creates an R package under the location of path which has mereley #' the provided Stan functions as source code. The code gets compiled #' and loaded via devtools load_all. The user gets returned back an #' environment containing the functions callable from R. Note that the #' functions do not get attached to the global environment. By default #' overwrite=TRUE such that whenever the stan functions change a new #' version of the package is created from scratch. load_stan_functions <- function(path, ..., overwrite=TRUE) { stan_functions_str <- paste(c(...), collapse="\n") if(stan_functions_str == "") { message("No Stan functions found.") return(invisible(new.env())) } stan_functions_code <- paste0(c("functions {\n", stan_functions_str, "\n}\n")) stan_file <- cmdstanr::write_stan_file(stan_functions_code) pkg_stan_file <- file.path(path, "inst", "stan", basename(stan_file)) if(file.exists(pkg_stan_file) & tools::md5sum(pkg_stan_file) == tools::md5sum(stan_file)) { message("Stan function file already exists! Nothing to do!") } else { ## package exists, but content is different => clean it if this is requested; or error if(overwrite) { unlink(path, recursive=TRUE, force=TRUE) } else { stop("Stan functions have changed and not permitted to overwrite.") } } if(!dir.exists(path)) { old_opts <- options(usethis.allow_nested_project=TRUE, usethis.quiet=TRUE) suppressMessages(rstantools::rstan_create_package(path, rstudio=FALSE, stan_files=stan_file, license=FALSE, open=FALSE)) options(old_opts) rpkg <- basename(path) ns <- file.path(path, "NAMESPACE") cat("# Generated by roxygen2: do not edit by hand\n", file=ns) cat("import(Rcpp)\n", file=ns, append=TRUE) cat("import(methods)\n", file=ns, append=TRUE) cat(paste0("useDynLib(", rpkg, ", .registration = TRUE)\n"), file=ns, append=TRUE) pkgbuild::compile_dll(path, compile_attributes=TRUE, quiet=TRUE, debug=FALSE) ## rewrite RcppExports so that Stan functions have working ## default arguments for the output stream, rng and lp ## arguments...would be great to have this done again whenever ## RcppExports changes, but not really needed. rcpp_exposed_functions <- new.env() rcpp_exports_file <- file.path(path, "R", "RcppExports.R") source(rcpp_exports_file, local=rcpp_exposed_functions) compiled_functions <- ls(rcpp_exposed_functions) ## WARNING: rng functions will by default always create a new rng ## object each time the function is called. It would be better ## to setup once an rng upon package load and then by default ## point there. Also note that each created rng uses the ## default seed of 0! for (x in compiled_functions) { FUN <- get(x, envir = rcpp_exposed_functions) args <- formals(FUN) if ("pstream__" %in% names(args)) args$pstream__ <- quote(rstan::get_stream()) if ("lp__" %in% names(args)) args$lp__ <- 0 if ("base_rng__" %in% names(args)) { message("Function ", x, " uses a random number generator.\nBy default an rng instance is created for each function invocation with seed 0.\nIt is recommended to use the base_rng__ argument explicitly, see ?rstan::get_rng for details.") args$base_rng__ <- quote(rstan::get_rng()) } formals(FUN) <- args assign(x, FUN, envir = rcpp_exposed_functions) } rcpp_exports_comment <- c(grep("^#", readLines(rcpp_exports_file), value=TRUE), "") rcpp_exports <- file(rcpp_exports_file, open="wt") writeLines(rcpp_exports_comment, con=rcpp_exports) for(x in compiled_functions) { cat(x, "<- ", file=rcpp_exports) dput(get(x, envir = rcpp_exposed_functions), file=rcpp_exports) } close(rcpp_exports) } message("Loading Stan package for functions from path ", path) invisible(devtools::load_all(path, attach=TRUE, quiet=TRUE)$env) } ``` Having this defined one can easily write any custom functions as a simple package to disk. Compilation then happens only once and once. Every subsequent load will use the compiled binaries and return instantly an environment with the exposed Stan functions.

Hopefully this can be cast into a rstantools feature.

andrjohns · January 22, 2024, 4:03pm

Automatically exposing Stan functions in a package is already supported in rstantools. Any .stanfunctions files in the package’s inst/stan directory will have the included functions compiled and exported in the package. This is currently being used by the lgpr and rmdcev packages

wds15 · January 23, 2024, 8:24am

Cool!

Looking at things it appears to me that my function would still be useful. The thing is that the stream and rng arguments need special attention. The function I wrote will automatically setup the formals of these functions by rewriting the RcppExports. This is a bit of a hack, but it seems to work very reliable and integrates with the overall framework. The difficulty with the stream and rng would be good to solve more principled (like having a package internal single instance of the rng) - maybe that could be included in rstantools…and currently the rstantools create package function does not accept .stanfunctions files. I am also not sure where these things are documented; at least I was not able to find them quickly.

Just a few points for consideration (great work in any case!).

bifouba · April 9, 2024, 3:37pm

Thanks. That’s already been an extremely helpful pointer, but I’m failing to implement this on the last stretch. However, it seems that the problem might not actually be with my code, because even when I do a clean install of rmdcev and try to use one of the functions, for example rmdcev:::CalcAltOrder(1, 1), I get an Error: Expecting an external pointer: [type=integer]., which is the same error I get for my own package. The exact same thing also happens with lgpr:::STAN_var_mask(1:3, 1), for example.

The problem is not with Rcpp as such, because when I create a test package with Rcpp::Rcpp.package.skeleton and add a simple function with an argument to its rcpp_hello_world.cpp file, I can call that just fine and it works. That’s why I suspect the problem is somehow caused by how the functions from Stan are written in the rstantools .cpp file?

This is running R 4.3.3, rstantools 2.4.0, rmdcev 1.2.6, Rcpp 1.0.12

Topic		Replies	Views
Rstan::expose_stan_functions inside a package? General	14	1305	January 24, 2019
Expose_stan_functions in package RStan	5	699	October 23, 2019
Exposing Stan user-defined functions using CmdStanR and Rcpp General	10	1566	June 11, 2024
Cmdstanr & sourceCpp / cppfunction for exposing functions Interfaces cmdstanr	16	1046	December 13, 2020
Expose_stan_functions with stan/math library function RStan rstan	7	743	June 9, 2020

Using expose_stan_functions in a package

Related topics