Speeding up RStan

wds15 · June 15, 2019, 10:16pm

I followed your suggestions to look into RcppParallel for the purpose of linking with the Intel TBB. That seems to work great - and we should probably make use of the TBB malloc replacement already now in order to speedup Stan programs. That is, on macOS I have observed really nice speedups from linking in the libtbbmalloc_proxy library which is distributed with RcppParallel. So here is the speedup when using this under R using on 4 cores the warfarin example I used for StanCon:

rstan 2.18.2:
   user  system elapsed
473.629   6.424 138.410


rstan 2.18.2 with tbbmalloc_proxy from RcppParallel:
   user  system elapsed
324.721   5.540  96.020

So this is a decent speedup and I really only changed loading the respective library. Right now this is a bit non-straightforward. What I had to do is along these lines:

In ~/.R/Makevars:
LDFLAGS += /Users/weberse2/R/2019-03-20-transient/RcppParallel/lib/libtbbmalloc.dylib /Users/weberse2/R/2019-03-20-transient/RcppParallel/lib/libtbbmalloc_proxy.dylib

In the R sources:

tbbmalloc_proxy  <- system.file("lib/libtbbmalloc_proxy.dylib", package="RcppParallel")
tbbmalloc  <- system.file("lib/libtbbmalloc.dylib", package="RcppParallel")
tbblib  <- system.file("lib/", package="RcppParallel")

Sys.getenv("DYLD_LIBRARY_PATH")
Sys.setenv(DYLD_LIBRARY_PATH=tbblib)
Sys.getenv("DYLD_LIBRARY_PATH")

dyn.load(tbbmalloc_proxy)
dyn.load(tbbmalloc)

pd_model_par_tbb  <- stan_model("warfarin_pd_tlagMax_2par_generated_218_tbb.stan", verbose=TRUE)

There is probably a better way to do it which is in line with R conventions… but I do not know this conventions and wanted to make it work; and voila, we see almost 40% speedup on my single run here. I haven’t seen these speedups on Linux and I don’t know about Windows… and possibly this model is benefiting a lot more than others.

However, what about we make this available as an easy to use option in RStan?

Best,
Sebastian

bgoodri · June 15, 2019, 10:20pm

I’ll look into. It is probably similar to how we link to StanHeaders to access the SUNDIALS shared object:

github.com

stan-dev/rstan/blob/develop/rstan/rstan/R/plugin.R#L90


  # in the file path of Rcpp's library. 
  
  # If rcpp_PKG_LIBS contains space without preceding '\\', add `\\'; 
  # otherwise keept it intact
  if (grepl('[^\\\\]\\s', rcpp_pkg_libs, perl = TRUE))
    rcpp_pkg_libs <- gsub(rcpp_pkg_path, rcpp_pkg_path2, rcpp_pkg_libs, fixed = TRUE) 


  
  list(includes = '// [[Rcpp::plugins(cpp14)]]',
       body = function(x) x,
       env = list(PKG_LIBS = paste(rcpp_pkg_libs,  
                                   paste0("-L", shQuote(StanHeaders_pkg_libs)),
                                   "-lStanHeaders"),
                  PKG_CPPFLAGS = paste(Rcpp_plugin$env$PKG_CPPFLAGS,
                                        PKG_CPPFLAGS_env_fun(), collapse = " ")))
}




# inlineCxxPlugin would automatically get registered in inline's plugin list.
# Note that everytime rstan plugin is used, inlineCxxPlugin
# gets called so we can change some settings on the fly

wds15 · June 17, 2019, 12:38pm

Hi @bgoodri !

Have look at the very bottom of the PR link below. There you see that we get some neat speedups from linking in the tbbmalloc_proxy library. So 6% on average and up to 18% for some models. Again, this is for free, since the only thing which changed is linking against the scalable memory allocator from the TBB.

bgoodri · June 17, 2019, 2:18pm

I do love free.

wds15 · June 17, 2019, 2:35pm

Oh… it’s better than that - free and good!

Topic		Replies	Views
R Package interfacing with Stan - models much slower than outside package RStan rstan	8	618	September 7, 2023
Poor performance for compiled Stan models General performance , rstanarm	10	2038	July 6, 2020
From fast to slow sampling on cluster after reset and older rstan version installed General	8	672	January 27, 2021
Problems linking to tbb with latest StanHeaders General	33	2749	June 19, 2020
Apple's new M1 processors and Stan General	20	6688	August 15, 2022

Speeding up RStan

Related topics