Gregory
November 17, 2018, 6:30am
1
Hi everyone,
I’ve implemented a version of a model to use map_rect() and it appears as if it works just like the original version. But I’m not sure how to run it in rstan across all of the available processors. I’ll probably be running the model as an sbatch job on a MPI cluster, or locally on a 20-core machine. I’m using rstan, but I’ve looked through the manual, on CRAN, and on mc-stan.org , and I can’t find out how to start using all of the available resources. Also, if I run 4 chains on a 20 core machine, does rstan simply work correctly to distribute the resources as necessary?
Thanks so much!
CXX14FLAGS=-O3 -march=native -mtune=native
CXX14FLAGS += -arch x86_64 -ftemplate-depth-256
rstan 2.18.2 2018-11-07 [1] CRAN (R 3.5.0)
3 Likes
bgoodri
November 17, 2018, 7:06am
2
It is basically this
The Stan Math Library is a C++, reverse-mode automatic differentiation library designed to be usable, extensive and extensible, efficient, scalable, stable, portable, and redistributable in order t...
except you need to put your configuration into the ~/.R/Makevars file rather than make/local of a CmdStan installation.
Running it locally on a 20-cores machine entails much less configuration. Just
The Stan Math Library is a C++, reverse-mode automatic differentiation library designed to be usable, extensive and extensible, efficient, scalable, stable, portable, and redistributable in order t...
with CXX14FLAGS += -DSTAN_THREADS
in ~/.R/Makevars and setting the environmental variable STAN_NUM_THREADS
at runtime.
Gregory
November 17, 2018, 7:15am
3
Thanks! Just so I’m clear, setting the environmental variable (on a Mac), I can just do the following in R?:
Sys.setenv(“STAN_NUM_THREADS” = 4)
Or do I need to set this somehow before launching R itself? Is there a simple way to test whether it’s working?
bgoodri
November 17, 2018, 7:17am
4
You can do it after launching R but before calling stan
or sampling
. You will hear if it is working.
Gregory
November 17, 2018, 7:29am
5
It only appears that one processor (not four) is working.
My Makevars in ~/.R/ looks like this:
CXX14FLAGS=-O3 -march=native -mtune=native
CXX14FLAGS += -arch x86_64 -ftemplate-depth-256
CXXFLAGS += -DSTAN_THREADS
Right when I load R, I type this:
Sys.setenv("STAN_NUM_THREADS" = 4)
Then I do:
library("rstan")
etc.
Then my sampling call is:
fitted_model <- sampling(my_model,
data = my_data,
warmup = 1000, iter = 2000,
thin = 2, refresh = 50,
open_progress = TRUE,
chains = 1)
Where my_model has about 300 shards.
But in Activity Monitor, I only see only R thread running (at 100%). I suspect I’m making a simple mistake here?
Gregory
November 17, 2018, 7:37am
6
Just to (I think) answer my own question, I needed to add:
CXX14FLAGS += -DSTAN_THREADS
instead of
CXXFLAGS += -DSTAN_THREADS
Gregory
November 19, 2018, 4:02am
8
After getting this working on my personal computer, I’ve been trying to get it on a high performance computer cluster. However, unlike on my personal computer, it doesn’t appear to be using any more cores than there are chains. So, right now, I have 4 chains on a 20-core system, and htop is showing me just 4 cores at 100% and the rest at 0%. This is on a CentOS machine. My .R/Makevars is:
CXX14 = icc -fPIC
CXX14FLAGS += -DSTAN_THREADS
CXX14FLAGS=-O3 -march=native -mtune=native
and I run this right after loading rstan:
Sys.setenv("STAN_NUM_THREADS" = 20)
Any ideas why it would work on my personal computer (a Mac), but not on the 20-core CentOS machine? Thanks!
Gregory
November 19, 2018, 4:18am
9
Looks like I got it working. I think my mistake was having
CXX14FLAGS = -O3 -march=native -mtune=native
instead of
CXX14FLAGS += -O3 -march=native -mtune=native
My new Makevars, which works is:
CXX14 = icc
CXX14FLAGS = -DSTAN_THREADS
CXX14FLAGS += -O3 -march=native -mtune=native
CXX14FLAGS += -fPIC
Thanks again for all the help!