Segfault, Makevars, and size of compiled code file

I am running on Ubuntu 20.04 with R version 4.0.2 and rstan version 2.19.3 installed as a debian package. I have been running rstan very successfully but have noticed that the size of the rstan::stan_model compiled files when saved as .rds files is about 20MB. Up until now I have not defined ~/.R/Makevars.

Following https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Linux I tried re-running rstan::stan_model after changing .R/Makevars to the following (including a blank line at the beginning)

CXX14FLAGS=-O3 -march=native -mtune=native -fPIC
CXX14=g++

stan_model ran without error, and produced a nice compact 1.4MB rds file. But now I get a segfault when sampling:

 *** caught segfault ***
address 0x7fec8f1a2508, cause 'memory not mapped'

Traceback:
 1: .Call(list(name = "CppObject__finalize", address = <pointer: 0x555573d889a0>,     dll = list(name = "Rcpp", path = "/usr/lib/R/site-library/Rcpp/libs/Rcpp.so",         dynamicLookup = TRUE, handle = <pointer: 0x5555748dc290>,         info = <pointer: 0x555568aa9c70>), numParameters = 2L),     <pointer: 0x55556d6e30c0>, .pointer)
 2: x$.self$finalize()
 3: (function (x) x$.self$finalize())(<environment>)
An irrecoverable exception occurred. R is aborting now ...

My Stan code is here: https://github.com/harrelfe/stan/blob/master/lrmconppo.stan

Before defining Makevars I was getting a different segfault with one model, which made me want to use Makevars. That was an “invalid permission” segfault.

Any help appreciated.

Would you have an example/fake dataset that I can test the code with? If I can reproduce the segfault then it would be a code issue, but if not then there could be a configuration issue to track down

Hi @harrelfe . The (un)serialization processes is very brittle with Stan programs. Basically, if you are just working interactively, the best way to do it is to call

library(rstan)
rstan_options(auto_write = TRUE)

or

library(rstan)
stan_model("lrmconppo.stan", auto_write = TRUE)

so that it saves a good .rds file in the same directory as lrmconppo.stan and uses that as long as lrmconppo.stan does not change. But putting the compiled code into an R package for others to use is another kettle of fish.

Hi Ben,

I have had bad luck with auto_write=TRUE when writing long reports, with rstan frequently recompiling code when it hasn’t changed.

Up until now I have not had any problem with restoring rstan models with readRDS(). But the serialiation problem you mentioned may explain the problem below.

Here is a self-contained test that works when I use the stan_model object directly but not when I write it and read it back in (segfault, memory not mapped). Earlier today I could not get the second model to faiil unless the first model was included in the script before it.

require(rstan)
options(mc.cores = parallel::detectCores())
dat1 <- readRDS(url('https://hbiostat.org/attach/dat1.rds'))
s1 <- readLines('https://raw.githubusercontent.com/harrelfe/stan/master/lrmppo.stan')
m1 <- stan_model(model_code=s1)
saveRDS(m1, '/tmp/m1.rds')
m1r <- readRDS('/tmp/m1.rds')
identical(m1, m1r)                     # FALSE
g <- rstan::sampling(m1r, data=dat1)   # works for m1

d  <- readRDS(url('https://hbiostat.org/attach/bcppodat.rds'))
s2 <- readLines('https://raw.githubusercontent.com/harrelfe/stan/master/lrmconppo.stan')
m2 <- stan_model(model_code=s2)
saveRDS(m2, '/tmp/m2.rds')
m2r <- readRDS('/tmp/m2.rds')
identical(m2, m2r)                     # FALSE
f <- rstan::sampling(m2r, data=d)      # works for m2

I ran this with the compiler flags in .R/Makevars listed earlier, which produced small .rds files. The program also bombs when I remove those flags and get the large .rds files.

What is a failsafe way to store compiled Stan code in R?

The only failsafe way is to package it. But for use on one computer, auto_write = TRUE works well as long as the version of rstan has not changed since it was serialized. If rstan has changed, then you need to delete the .rds files and call stan_model again with auto_write = TRUE. Also, in RMarkdown files, putting cache = TRUE in the chunks that do MCMC works reliably.

I’ve tried both of those methods without success. cache=TRUE creates a 1GB cache file for a long report, and auto_write has not been reliable for me (sensing changes that aren’t there). I see now how rstanarm does this in its stanmodels.R function. I assume that is the “package it” approach. I hope that will work without the other complexities of rstanarm/src. It would be simpler if there were reliable save and load functions that work with arbitrary objects. I wonder how R saves exact images of objects when packages are compiled.

Packages have the compiled code in the .so / .dll rather than a .rds file. See

https://mc-stan.org/rstantools/articles/index.html

That is helpful but seems to apply to packages that make rstan mandatory. In the rms package, rstan is optional and I have users run rms::stancompile() one time to compile all the Stan code (in Github) and store each object in an .rds file.

That is true. But users will have to install a C++ toolchain to call rms::stancompile, which many of them won’t be able to do in order to serialize .rds object. And once they do that, it will recompile if any of the following triggers

which are

  • non-existence of the .rds file
  • the .rds having a modified time before rstan was released
  • the .rds being not valid
  • hash mismatching the .stan code in the .rds with the current .stan code

Thanks for that. I’d like to keep it simpler than that. As a side question when I create a stanmodels object as in the rstanarm stanmodels.R code, which uses rstan::stanc, I git an object that is too small to have contained any compiled code, and I get an error when running rstan::sampling on an element of the stored object:

Error in get(paste0("model_", model_cppname)) : 
  object 'model_lrmconppo' not found

It’s confusing because I can’t find anything in rstanarm code that indicates that the stanmodels object is augmented after it is created.

If everyone were always using the rms package with knitr I would be tempted to have an initial setup chunk that compiles all the Stan code in rms and stores it in a list, and cache only that chunk. But faithful serialization would solve all this.

stanc just generates the C++ code. At some point, stan_model has to be called on that C++ code in order to compile it, which is what rstantools facilitates after having gone through it with rstanarm.

1 Like

Thanks Ben. If you think of a way to do a “partial rstantools” approach for a package’s optional use of rstan please pass it along. In the meantime I’ll explore these options:

  • Recommending the use of knitr with cache=TRUE for the Stan compile chunk, hoping that the cache file is always faithful in its serialization of R objects
  • Explore whether I should put Bayesian modeling functions in a separate package that requires rstan and may end up requiring rstanarm too, with pre-compiled Stan code only. For now I hesitate to do that because the Bayesian fitting functions are used with a lot of rms functions, not all of them exported.

For a while, the prophet package would compile its Stan program at installation time (i.e. it wasn’t optional), which at least insures that the timestamps are right, but a few months ago they wisely decided to go the rstantools route.

There is an example of caching the compiled model in a .Rmd file at