Threading with backend = "rstan"

Hi all,

I am trying to do within-chain parallelization in brms using rstan as a backend, which, per the latest brms release notes should work if I understand correctly with Stan versions >= 2.25.

My system is:

  • Operating System: Ubuntu 20.04
  • R: Version: 4.0.4
  • brms Version: 2.15.0
  • rstan version: 2.26.1
  • StanHeaders: 2.26.1

I am trying to run the model:

m1 <- brm(weight ~ Time, chains = 4, cores = 4, threads = threading(2), backend = "rstan", data = ChickWeight)

which fails to compile with following output:

> m1 <- brm(weight ~ Time, chains = 4, cores = 4, threads = threading(2), backend = "rstan", data = ChickWeight)
Compiling Stan program...
During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C" 
2: Setting LC_COLLATE failed, using "C" 
3: Setting LC_TIME failed, using "C" 
4: Setting LC_MESSAGES failed, using "C" 
5: Setting LC_MONETARY failed, using "C" 
6: Setting LC_PAPER failed, using "C" 
7: Setting LC_MEASUREMENT failed, using "C" 
make cmd is
  make -f '/usr/lib/R/etc/Makeconf' -f '/usr/share/R/share/make/shlib.mk' -f '/home/julian/.R/Makevars' CXX='$(CXX14) $(CXX14STD)' CXXFLAGS='$(CXX14FLAGS)' CXXPICFLAGS='$(CXX14PICFLAGS)' SHLIB_LDFLAGS='$(SHLIB_CXX14LDFLAGS)' SHLIB_LD='$(SHLIB_CXX14LD)' SHLIB='file29064b2c005a.so' OBJECTS='file29064b2c005a.o'

make would use
"/bin/g++" -std=gnu++14 -I"/usr/share/R/include" -DNDEBUG   -I"/home/julian/R/x86_64-pc-linux-gnu-library/4.0/Rcpp/include/"  -I"/home/julian/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/"  -I"/home/julian/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/unsupported"  -I"/home/julian/R/x86_64-pc-linux-gnu-library/4.0/BH/include" -I"/home/julian/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/src/"  -I"/home/julian/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/"  -I"/home/julian/R/x86_64-pc-linux-gnu-library/4.0/RcppParallel/include/"  -I"/home/julian/R/x86_64-pc-linux-gnu-library/4.0/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -DBOOST_NO_AUTO_PTR  -include '/home/julian/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1      -fpic  -O3 -march=native -mtune=native -fPIC -c file29064b2c005a.cpp -o file29064b2c005a.o
if test  "zfile29064b2c005a.o" != "z"; then \
  echo "/bin/g++" -std=gnu++14 -shared -L"/usr/lib/R/lib" -Wl,-Bsymbolic-functions -Wl,-z,relro -o file29064b2c005a.so file29064b2c005a.o  '/home/julian/R/x86_64-pc-linux-gnu-library/4.0/rstan/lib//libStanServices.a' -L'/home/julian/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/lib/' -lStanHeaders -L'/home/julian/R/x86_64-pc-linux-gnu-library/4.0/RcppParallel/lib/' -ltbb   -L"/usr/lib/R/lib" -lR; \
  "/bin/g++" -std=gnu++14 -shared -L"/usr/lib/R/lib" -Wl,-Bsymbolic-functions -Wl,-z,relro -o file29064b2c005a.so file29064b2c005a.o  '/home/julian/R/x86_64-pc-linux-gnu-library/4.0/rstan/lib//libStanServices.a' -L'/home/julian/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/lib/' -lStanHeaders -L'/home/julian/R/x86_64-pc-linux-gnu-library/4.0/RcppParallel/lib/' -ltbb   -L"/usr/lib/R/lib" -lR; \
fi
Error in compileCode(f, code, language = language, verbose = verbose) : 
  /home/julian/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/tbb/parallel_reduce.h:270:44:   required from 'void tbb::interface9::internal::start_deterministic_reduce<Range, Body, Partitioner>::run_body(Range&) [with Range = tbb::blocked_range<long unsigned int>; Body = stan::math::internal::reduce_sum_impl<model29061cad3d7c__namespace::partial_log_lik_lpmf_rsfunctor__<false>, void, stan::math::var_value<double>, const std::vector<int>&, const Eigen::Matrix<double, -1, 1, 0, -1, 1>&, const Eigen::Matrix<double, -1, -1, 0, -1, -1>&, Eigen::Matrix<stan::math::var_value<double, void>, -1, 1, 0, -1, 1>&, stan::math::var_value<double, void>&, stan::math::var_value<double, void>&>::recursive_reducer; Partitioner = const tbb::simple_partitioner]'/home/julian/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/tbb/partitioner.h:507:9:   required from 'void tbb::interface9::internal::simple_partition_type::execute(StartType&, Range&) [with StartType = tbb::interface9::internal
Error in sink(type = "output") : invalid connection

The stancode of the model looks as follows:

> make_stancode(weight ~ Time, chains = 4, cores = 4, threads = threading(2), data = ChickWeight)
// generated with brms 2.15.0
functions {
  /* integer sequence of values
   * Args: 
   *   start: starting integer
   *   end: ending integer
   * Returns: 
   *   an integer sequence from start to end
   */ 
  int[] sequence(int start, int end) { 
    int seq[end - start + 1];
    for (n in 1:num_elements(seq)) {
      seq[n] = n + start - 1;
    }
    return seq; 
  } 
  // compute partial sums of the log-likelihood
  real partial_log_lik_lpmf(int[] seq, int start, int end, vector Y, matrix Xc, vector b, real Intercept, real sigma) {
    real ptarget = 0;
    int N = end - start + 1;
    ptarget += normal_id_glm_lpdf(Y[start:end] | Xc[start:end], Intercept, b, sigma);
    return ptarget;
  }
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int grainsize;  // grainsize for threading
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  int seq[N] = sequence(1, N);
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b;  // population-level effects
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> sigma;  // residual SD
}
transformed parameters {
}
model {
  // likelihood including constants
  if (!prior_only) {
    target += reduce_sum(partial_log_lik_lpmf, seq, grainsize, Y, Xc, b, Intercept, sigma);
  }
  // priors including constants
  target += student_t_lpdf(Intercept | 3, 103, 69.7);
  target += student_t_lpdf(sigma | 3, 0, 69.7)
    - 1 * student_t_lccdf(0 | 3, 0, 69.7);
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
}

The models
m2 <- brm(weight ~ Time, chains = 4, cores = 4, backend = "rstan", data = ChickWeight)
and
m3 <- brm(weight ~ Time, chains = 4, cores = 4, threads = threading(2), backend = "cmdstanr", data = ChickWeight)

both work. The reason why I want to use rstan is mainly because it works with loo_moment_match which, AFAIK, is not possible with cmdstanr at the moment (?).

Does anyone know why the model with rstan + threading fails to compile?

Thanks in advance!

Julian

@hsbadr would you maybe know why?

This is a compatibility issue between the TBB headers in Math and the older version of those headers in the CRAN version of RcppParallel. It was fixed by Intel TBB 2019 Update 8 by hsbadr · Pull Request #151 · RcppCore/RcppParallel · GitHub.

It seems that RcppParallel will get a CRAN release soon. Until then, you may install the development version from GitHub, or replace tbb/parallel_reduce.h with the one in Math.

remove.packages("RcppParallel")
remotes::install_git('https://github.com/RcppCore/RcppParallel', dependencies = TRUE)
1 Like

Thanks for your quick reply @hsbadr and for pointing to the right person @rok_cesnovar .

I tried updating to the development version of RcppParallel, as you suggested. However it still does not work, but the error message is slightly different this time I belief:

 m2 <- brm(weight ~ Time, chains = 4, cores = 4, threads = threading(2), backend = "rstan", data = ChickWeight)
Compiling Stan program...
make cmd is
  make -f '/usr/lib/R/etc/Makeconf' -f '/usr/share/R/share/make/shlib.mk' -f '/home/julian/.R/Makevars' CXX='$(CXX14) $(CXX14STD)' CXXFLAGS='$(CXX14FLAGS)' CXXPICFLAGS='$(CXX14PICFLAGS)' SHLIB_LDFLAGS='$(SHLIB_CXX14LDFLAGS)' SHLIB_LD='$(SHLIB_CXX14LD)' SHLIB='filec0b3eb6d2dc.so' OBJECTS='filec0b3eb6d2dc.o'

make would use
"/bin/g++" -std=gnu++14 -I"/usr/share/R/include" -DNDEBUG   -I"/home/julian/R/x86_64-pc-linux-gnu-library/4.0/Rcpp/include/"  -I"/home/julian/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/"  -I"/home/julian/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/unsupported"  -I"/home/julian/R/x86_64-pc-linux-gnu-library/4.0/BH/include" -I"/home/julian/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/src/"  -I"/home/julian/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/"  -I"/home/julian/R/x86_64-pc-linux-gnu-library/4.0/RcppParallel/include/"  -I"/home/julian/R/x86_64-pc-linux-gnu-library/4.0/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -DBOOST_NO_AUTO_PTR  -include '/home/julian/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1
   -fpic  -O3 -march=native -mtune=native -fPIC -c filec0b3eb6d2dc.cpp -o filec0b3eb6d2dc.o
if test  "zfilec0b3eb6d2dc.o" != "z"; then \
  echo "/bin/g++" -std=gnu++14 -shared -L"/usr/lib/R/lib" -Wl,-Bsymbolic-functions -Wl,-z,relro -o filec0b3eb6d2dc.so filec0b3eb6d2dc.o  '/home/julian/R/x86_64-pc-linux-gnu-library/4.0/rstan/lib//libStanServices.a' -L'/home/julian/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/lib/' -lStanHeaders -L'/home/julian/R/x86_64-pc-linux-gnu-library/4.0/RcppParallel/lib/' -ltbb   -L"/usr/lib/R/lib" -lR; \
  "/bin/g++" -std=gnu++14 -shared -L"/usr/lib/R/lib" -Wl,-Bsymbolic-functions -Wl,-z,relro -o filec0b3eb6d2dc.so filec0b3eb6d2dc.o  '/home/julian/R/x86_64-pc-linux-gnu-library/4.0/rstan/lib//libStanServices.a' -L'/home/julian/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/lib/' -lStanHeaders -L'/home/julian/R/x86_64-pc-linux-gnu-library/4.0/RcppParallel/lib/' -ltbb   -L"/usr/lib/R/lib" -lR; \
fi
Error in compileCode(f, code, language = language, verbose = verbose) :
  /home/julian/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/tbb/parallel_reduce.h:270:44:   required from ‘void tbb::interface9::internal::start_deterministic_reduce<Range, Body, Partitioner>::run_body(Range&) [with Range = tbb::blocked_range<long unsigned int>; Body = stan::math::internal::reduce_sum_impl<modelc0b67329344__namespace::partial_log_lik_lpmf_rsfunctor__<false>, void, stan::math::var_value<double>, const std::vector<int>&, const Eigen::Matrix<double, -1, 1, 0, -1, 1>&, const Eigen::Matrix<double, -1, -1, 0, -1, -1>&, Eigen::Matrix<stan::math::var_value<double, void>, -1, 1, 0, -1, 1>&, stan::math::var_value<double, void>&, stan::math::var_value<double, void>&>::recursive_reducer; Partitioner = const tbb::simple_partitioner]’/home/julian/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/tbb/partitioner.h:507:9:   required from ‘void tbb::interface9::internal::simple_partition_type::execute(StartType&, Range&) [with StartType = tbb::interface9::int
Error in sink(type = "output") : invalid connection

As it mentions parallel_reduce.h from StanHeaders, I tried to replace it with the version from RcppParallel, but that also did not help.
I also tried reinstalling StanHeaders and rstan after updating RcppParallel, but that also did not solve it. Any ideas?

@julianquandt Can you try after removing the following directory from StanHeaders:

rm -rf /home/julian/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/tbb

TBB headers shouldn’t be included in StanHeaders but we’ve them temporarily for the transition to CRAN. Some reverse dependencies need to link to RcppParallel, since TBB is now required by Stan, but they do not in their current version on CRAN.

Also, since you’re on Linux, I recommend using an external TBB library (oneTBB) for both RcppParallel and Stan. You need to follow the simple instructions here and reinstall the development version of RcppParallel.

remotes::install_git('https://github.com/RcppCore/RcppParallel')

Make sure that it links to the correct version of TBB:

RcppParallel::LdFlags()
1 Like

Thanks for your help again!

I followed your recommendations and installed TBB (FYI the wget link for oneTBB in the doc didnt work for me at first, had to pass the version number directly) , reinstalled RcppParallel, StanHeaders and rstan.

The Flags check out:

-L'/home/julian/oneapi-tbb-2021.1.1/lib/intel64/gcc4.8' -Wl,-rpath,'/home/julian/oneapi-tbb-2021.1.1/lib/intel64/gcc4.8' -ltbb -ltbbmalloc>

and after also removing tbb from StanHeaders it works like a charm now!!

Many thanks!!

also tagging @paul.buerkner to let you know that this indeed works now :)

2 Likes

@hsbadr so once the dev version of RCppParallel hits CRAN this will be fixed? Or will the users still need to delete the files?

I think TBB headers in StanHeaders will always cause issues. @bgoodri added them (include tbb headers from RcppParallel · stan-dev/rstan@701a7dc · GitHub) to build baggr.

Since we’re waiting for other dependencies to release patches (e.g., Support Stan v2.26+ / Math v4.0+ by hsbadr · Pull Request #308 · OpenMx/OpenMx · GitHub) for the new version StanHeaders, I suggest to remove the TBB headers from now and ask the devs to link to RcppParallel as a TBB requirement for Stan >= 2.26 or respect the compiler flags of rstan plugin.

2 Likes

@rok_cesnovar I’ve removed TBB headers from StanHeaders 2.26.1 (StanHeaders 2.26 supports rstan 2.21 by hsbadr · Pull Request #912 · stan-dev/rstan · GitHub). Please update the binary packages from the last GHA artifacts.

@bgoodri Please let the devs of baggr and any affected reverse dependencies to link to TBB from RcppParallel (i.e. respecting RcppParallel::CxxFlags() and RcppParallel::LdFlags()). I think that almost all packages already link to RcppParallel. So, this should be affecting just a few packages, if not baggr only.

A follow-up question: it works in R session from terminal, but turns out that whenever I want to use Rstudio (Rstudio server on WSL) the RcppParallel::LdFlags() are gone. Specifically, when I start rstudio-server on the WSL server, the flags seem to be reset and the flags are also gone in terminal. I can register TBB again via the method in the doc, but it doenst affect the rstudio-server session, only R from terminal, and after restarting the server, flags are gone again in both. Am I missing anything?

Works now, adding

TBB_VERSION="2021.1.1"
TBB=${HOME}/oneapi-tbb-${TBB_VERSION}
TBB_INC=${TBB}/include
TBB_LIB=${TBB}/lib/intel64/gcc4.8

to etc/R/Renviron solved the problem!

1 Like