Exposed Stan functions for generated quantities in an R package


#1

I am working on an R package that fits models for sensitivity analysis. Based on the user’s scientific question, they may want some generated quantities and not others. I have post-processing functions written as Stan functions, but I cannot figure out how to expose and export them to use inside my R functions. Should I be calling expose_stan_functions in the package somehow? Is there another way?

I also recall from StanCon that standalone generated quantities were on the way or maybe even a completed feature – this could also work as a post-processing strategy. Have those made it into rstan?


#2

the dedicated facility for this is on it’s way into rstan 2.18.

It sounds to be as if you want to use expose_stan_functions to generate a cpp file which you then manually pull out and put into your src folder. Things work rather magically using Rcpp properties or whatever its called; just follow documentation on Rcpp and it should be straightforward to figure out.


#3

I think you’re talking about Rcpp attributes. I hadn’t considered trying to grab the generated C++ code for Rcpp. That seems like it could work. For some reason I was thinking that documenting it within the Stan file and adding an roxygen2 @export tag would do the trick. Now I wish I had made a note of why I expected that to work!

Will the standalone generated quantities in 2.18 have access to all of the same inputs that you can access in the normal generated quantities block? That is, will they work as if it had been run at the same time as the rest of the model (except for the RNG stream, of course)? If so, maybe I should put up with calculating a few extra generated quantities for now and then plan to switch over to standalone blocks after 2.18 comes out.


#4

maybe your export trick works, actually. Never thought about this; this would be quite neat.

To clarify: With 2.18 a clean expose stan functions facility will land in rstan. What you are referring to is another facility which allows to rerun the generated quantities block on some posterior. This facility is in the stan sources, but not yet made available in the interfaces. I don’t know when the rstan devs plan on exposing this somehow.


#5

Oh, I think I may have found the source of my misunderstanding. The Stan manual says “Function documentation should follow the Javadoc and Doxygen styles. Here’s an example…” then shows a function documented with @param and @return. At no point does it ever suggest @export will work, but past success with roxygen2 has trained me to believe in the quasimagical export powers of function documentation. I didn’t even question that assumption until it didn’t work. My bad.

My very basic attempts suggest that it does not work. Would have been fun, though!


#6

Hey, I implemented the current (not sure if it’s out yet) magic for expose_stan_functions and wouldn’t mind doing it for the generated quantities since it would basically allow us to do fairly arbitrary post-processing but could you describe your use case better? I don’t understand whether you want to compile C++ into your generated R package or if you want something that compiles on the fly, etc… Use cases make it much better to design how something ends up implemented.


#7

Sure! For background, I work in causal inference where analyses can often be separated into “statistical modeling” and “causal modeling” stages. The statistical models usually look like typical regression models. The causal stage might be a sensitivity analysis or it could be the calculation of some causal estimand that depends on regression parameters (e.g., a natural direct effect for mediation). The causal stage is really where I see major potential. I guess I imagine these functions prewritten/compiled with the rest of an rstan package, but I’m not completely sure.

To go a bit more in depth, here are two broad use cases and their inputs.

  1. Post-processing for functions of parameters and possibly data. Let’s say I am doing a mediation analysis so I fit two logistic regression models simultaneously, giving parameter vectors alpha and beta. Depending on the situation, I could be interested in the natural direct effect, the natural indirect effect, the total effect, etc. These might just be functions of alpha and/or beta. But for certain model specifications, these estimands are also a function of the distribution of baseline covariates. That means I might want to bootstrap the input data or do other simulation-based estimation. Inputs would be parameter draws + model data (possibly).

  2. Sensitivity analyses with a parameter completely uninformed by data. I can fit a model to observed data, but there is some completely unidentifiable sensitivity parameter that I want to vary in order to see the different implications for a causal quantity. I do not want to have to redo sampling of the data-informed parameters every time I want to try a new value of the sensitivity parameter. (This functionality might be used to make interactive sensitivity plots.) Inputs would be parameter draws + model data (possibly) + new “data” values (the sensitivity parameters).

I can already do these things in R. Besides the speed of C++, the main benefit of keeping this all within Stan is that it would port more easily to PyStan or CmdStan. This was probably way more detail than you wanted, but I’m really excited about this feature!


#8

Thanks for the thorough description, this is really similar to what I think of as best practices and I can see the benefit of wanting to push computations out of R code.

Until we have the generated quantities stuff plumbed through, I suggest writing Stan functions s.t. they take a vector[] theta to represent the parameters and loop over the iterations in theta to do whatever transformations you need. You can use the C++ generated by the current expose_stan_functions. You can also use the R/C++ wrapper that’s generated by expose_stan_functions as well.

A few tricks: 1) Use the 2.18 version of rstan (currently only available as the develop branch on github. The current version still does a bunch of transformations in R code that sometimes cause problems so there’s less to understand with the new version; 2) call expose_stan_functions with the cacheDir argument set so that you can find all the generated code to borrow, it should have all the tags s.t. Rcpp can compile it already; and 3) if you want to make your life easier with the package, rely on expose_stan_functions directly in your code, don’t try to compile it yourself. Now almost all the generated code is generated by either Rcpp or stan-dev/stan code so there are more people interested in making sure it compiles.