Avoid recompiling the exact same Stan model with brms

paul.buerkner · October 21, 2018, 5:56pm

This is primarily a question to @bgoodri I guess.

Since rstan has a mechanism to figure out if the same Stan model was already compiled before in the same R session, I want to make use of this mechanism in brms as well.

However, no matter what model I refit twice, I always get
"recompiling the Stan model to avoid crashing the R session"
which I believe stems from the current brms workflow to generate the Stan code, which is as follows:

In make_stancode, parse the Stan model for the first time via

rstan::stanc_builder(
  file = temp_file, isystem = isystem,
  obfuscate_model_name = TRUE
)

to make sure that all #include statements are resolved. To brm, I then only return .$model_code, which is then put into rstan::stan_model to compile the Stan model.

My question is: What do I need to change in order to allow rstan to re-use an existing model without compiling it again?

bgoodri · October 21, 2018, 6:26pm

I think rstan::stanc_builder should not be used in new (or autogenerated) code now that it is possible to #include in stanc directly without first pre-processing the file in R. However, getting it to work without an error is very difficult for a human. The # has to be flush left, the path to the file probably cannot contain spaces or any non-ASCII characters, and there can be nothing to the right of the path (not even whitespace or comments). There are examples of how to do it in rstanarm such as

github.com

stan-dev/rstanarm/blob/master/src/stan_files/polr.stan#L142


* @param w Vector (see reference manual)
* @param v Integer array (see reference manual)
* @param u Integer array (see reference manual)
* @param b Vector that is multiplied from the left by the CSR matrix
* @return A vector that is the product of the CSR matrix and b
*/
vector csr_matrix_times_vector2(int m, int n, vector w, int[] v, int[] u, vector b);
}
data {
// declares N, K, X, xbar, dense_X, nnz_x, w_x, v_x, u_x
#include /data/NKX.stan
int<lower=2> J;             // number of outcome categories, which typically is > 2
int<lower=1,upper=J> y[N];  // ordinal outcome
// declares prior_PD, has_intercept, link, prior_dist, prior_dist_for_intercept
#include /data/data_glm.stan
// declares has_weights, weights, has_offset, offset
#include /data/weights_offset.stan


// hyperparameter values
real<lower=0> regularization;
vector<lower=0>[J] prior_counts;

So, I would do like that and pass isystem = system.file("chunks", package = "brms") to stanc.

paul.buerkner · October 21, 2018, 6:55pm

Thanks! The problem I see is that users may want to use the Stan code generated by brms and there, the #include statement won’t help. I need to resolve them right away otherwise the generated Stan code my be unusable outside of brms. Or is there a work-around via stanc as well?

bgoodri · October 21, 2018, 7:15pm

I am not totally sure I understand. But you could do
#include full/path/to/brms/chunks/foo.stan
and not rely on isystem. That would allow make_stancode to generate Stan code that would compile on any computer that has brms installed but if there is a space in the full path, it might not work. You could try putting quotation marks around it.

paul.buerkner · October 21, 2018, 7:19pm

I want the brms generated Stan code to be self-contained and portable, which doesn’t seem to be the case with the new stanc way of handling #include. Also I couldn’t manage to solve the "recompiling the Stan model to avoid crashing the R session" problem with it (now it doesn’t even recognize that the same model was compiled already), but maybe I haven’t tried hard enough.

bgoodri · October 21, 2018, 8:10pm

I think the reason it wants to recompile is because stan_model looks for stanfit and stanmodel S4 objects rather than looking inside some arbitrary S3 object for a stanfit

github.com

stan-dev/rstan/blob/develop/rstan/rstan/R/rstan.R#L68


                   model_name = model_name, verbose = verbose,
                   obfuscate_model_name = obfuscate_model_name, 
                   allow_undefined = allow_undefined)


# find possibly identical stanmodels
model_re <- "(^[[:alnum:]]{2,}.*$)|(^[A-E,G-S,U-Z,a-z].*$)|(^[F,T].+)"
if(!is.null(model_name))
  if(!grepl(model_re, model_name))
    stop("model name must match ", model_re)
S4_objects <- apropos(model_re, mode="S4", ignore.case=FALSE)
if (length(S4_objects) > 0) {
  e <- environment()
  stanfits <- sapply(mget(S4_objects, envir = e, inherits = TRUE), 
                     FUN = is, class2 = "stanfit")
  stanmodels <- sapply(mget(S4_objects, envir = e, inherits = TRUE), 
                       FUN = is, class2 = "stanmodel")
  if (any(stanfits)) for (i in names(which(stanfits))) {
    obj <- get_stanmodel(get(i, envir = e, inherits = TRUE))
    if (identical(obj@model_code[1], stanc_ret$model_code[1])) return(obj)
  }
  if (any(stanmodels)) for (i in names(which(stanmodels))) {

I think brms would need to check the tools::md5sum of the generated Stan program like

github.com

stan-dev/rstan/blob/develop/rstan/rstan/R/rstan.R#L48


#     by using returned results from stanc. 
#   model_code: if file is not specified, we can used 
#     a character to specify the model.   


if (is.null(stanc_ret)) {
  model_name2 <- deparse(substitute(model_code))
  if (is.null(attr(model_code, "model_name2")))
    attr(model_code, "model_name2") <- model_name2
  if (missing(model_name)) model_name <- NULL 
  
  if(missing(file)) {
    tf <- tempfile()
    writeLines(model_code, con = tf)
    file <- file.path(dirname(tf), paste0(tools::md5sum(tf), ".stan"))
    if(!file.exists(file)) file.rename(from = tf, to = file)
    else file.remove(tf)
  }
  else file <- normalizePath(file)
  
  stanc_ret <- stanc(file = file, model_code = model_code, 
                     model_name = model_name, verbose = verbose,

and then check if there is a corresponding file with a .rds extension. If so, readRDS that as your stanmodel instead of calling stan_model.

I’ll work on generating portable Stan files. I think we will have to call the preprocessor on the generated code.

bgoodri · October 22, 2018, 1:39am

I added some code to ask the C++ preprocessor to generate a text file with the #includes replaced by the files they include.

+  if (grepl("#include", model_code, fixed = TRUE)) {
+    model_code <- scan(text = model_code, what = character(), sep = "\n", quiet = TRUE)
+    model_code <- gsub('#include /', '#include ', model_code, fixed = TRUE)
+    model_code <- gsub('#include (.*$)', '#include "\\1"', model_code)
+    unprocessed <- tempfile(fileext = ".stan")
+    processed <- tempfile(fileext = ".stan")
+    on.exit(file.remove(c(unprocessed, processed)))
+    writeLines(model_code, con = unprocessed)
+    ARGS <- paste("-E -nostdinc -x c++ -P -C", paste("-I", isystem, " ", collapse = ""), 
+                  "-o", processed, unprocessed)
+    pkgbuild::with_build_tools(system2(CXX, args = ARGS))
+    if (file.exists(processed)) model_code <- paste(readLines(processed), collapse = "\n")
+  }

@Bob_Carpenter This would be easier and less R-specific if stanc had an option to output the Stan code after replacing all the #include statements. Also, if it were less particular about whether the directory contains a trailing slash, whether the filename can be quoted, etc.

paul.buerkner · October 22, 2018, 7:24am

Thanks Ben! I have now written the following helper function to check if a compiled Stan model already exists:

get_stan_model <- function(model_code, args) {
  model_code <- as_one_character(model_code)
  stopifnot(is.list(args))
  # check if a compiled version of the Stan model already exists
  tf <- tempfile()
  writeLines(model_code, con = tf)
  file <- file.path(dirname(tf), paste0(tools::md5sum(tf), ".rds"))
  file.remove(tf)
  if (file.exists(file) && !length(args)) {
    message("Using the compiled C++ model")
    out <- readRDS(file)
  } else {
    message("Compiling the C++ model")
    args$model_code <- model_code
    out <- do.call(rstan::stan_model, args)
  }
  out
}

This function works in the sense that on the second time the model is fit, it uses the compiled C++ model. Unfortunately, the R session crashes immediately after. Apparently, rstan had its reasons to throw "recompiling the Stan model to avoid crashing the R session", but what could that reason be and how can I avoid it?

bgoodri · October 22, 2018, 2:29pm

Basically, you are going to have to dyn.unload the dynamically loaded module before trying to dynamically reload it via readRDS. If brms is not doing anything with the stanfit object like calling log_prob, grad_log_prob, etc., then it is possible that brm could dyn.unload the dynamically loaded module from the stanfit object before returning, and then it would “always” be safe to dynamically reload it in get_stan_model.

Calling dyn.unload is not easy. Its argument is “a character string giving the pathname to a DLL”, which is stored (minus its extension, which is .dll on Windows and .so otherwise) in the dso_filename slot of a cxxdso object, which in turn is in the dso slot of a stanmodel. So, it would be something like dyn.unload(file.path(tempdir(), paste0(model@dso@dso_filename, .Platform$dynlib.ext)).

paul.buerkner · October 22, 2018, 7:32pm

That doesn’t sound like a lot of fun, but thanks for the detailed explanation.

One more question: If I have to unload the module before loading it via readRDS,
how do I find the dso_filename of the model?

bgoodri · October 22, 2018, 8:37pm

I was thinking that you would do so right before brm terminates, in which case you have the model in scope and can do dyn.unload(file.path(tempdir(), paste0(model@dso@dso_filename, .Platform$dynlib.ext)).

paul.buerkner · October 22, 2018, 10:00pm

I see. What functionality would we loose by unloading the dso?

bgoodri · October 22, 2018, 10:02pm

Everything documented under ?log_prob.

bgoodri · October 22, 2018, 10:03pm

bridgesampling might be using unconstrain_pars

paul.buerkner · October 22, 2018, 10:07pm

Yes, bridgesampling was one of the things I had in mind that will likely fail, but maybe I find a workaround for that.

bgoodri · October 22, 2018, 10:09pm

For rstanarm, the bridgesampling method requires the diagnostic_file argument be specified originally so that the unconstrained parameters are written to the disk. Then bridgesampling fishes them out of that CSV file.

paul.buerkner · October 23, 2018, 9:35am

Thank you so much, Ben, for you insights!

To be honest, I would prefer keeping my hands off of things I don’t thoroughly understand, and the DSO is certainly one such thing.

In an earlier post, you said that rstan is looking for stanfit objects (presumably in the global env) to take the DSOs from. Indeed, the following seems to work without recompilation:

library(brms)
# requires compilation on the first trial
fit <- brm(count ~ Trt, epilepsy)
# load the stanfit object in the global environment
stanfit <- fit$fit
# does not require compilation anymore
fit2 <- brm(count ~ Trt, epilepsy)

Could rstan be convinced to also look for stanfit objects in the fit slot of brmsfit objects?

Bob_Carpenter · October 23, 2018, 3:44pm

I think it would be a bad idea to have a dependency from rstan to brms. We want to be moving to fewere dependencies, not more.

Could you instead create a Stan fit object out of your brmsfit object so it’d be in scope? You already have the dependency from brms to rstan, I think.

paul.buerkner · October 23, 2018, 3:47pm

It wouldn’t create a dependency in rstan as it would basically require the check inherits(object, "brmsfit"), which does not require brms to be installed or anything.

Bob_Carpenter · October 23, 2018, 3:59pm

Thanks for clarifying. As long as that works without brms installed, then it should be fine dependency-wise.

Even so, I’m reluctant to start tooling our base packages for downstream packages, but in this case, I think @bgoodri should make the decision.

Topic		Replies	Views
Brms keeps recompiling C++ model in every run brms	3	2043	June 28, 2018
Cannot run models after updating brms and rstan brms	31	5061	October 2, 2020
Rstan crashes R at end of model compilation RStan rstan , compiler , brms	8	838	April 12, 2021
Impossible compilation in rstan or brms (Linux), due to C++ compilation problem RStan rstan , compiler , brms	8	895	April 2, 2021
Error in brms compiling brms rstan	19	3256	March 8, 2021

Avoid recompiling the exact same Stan model with brms

Related topics