RStan and map_rect

Hi everyone,

I’ve implemented a version of a model to use map_rect() and it appears as if it works just like the original version. But I’m not sure how to run it in rstan across all of the available processors. I’ll probably be running the model as an sbatch job on a MPI cluster, or locally on a 20-core machine. I’m using rstan, but I’ve looked through the manual, on CRAN, and on mc-stan.org, and I can’t find out how to start using all of the available resources. Also, if I run 4 chains on a 20 core machine, does rstan simply work correctly to distribute the resources as necessary?

Thanks so much!

CXX14FLAGS=-O3 -march=native -mtune=native
CXX14FLAGS += -arch x86_64 -ftemplate-depth-256

rstan 2.18.2 2018-11-07 [1] CRAN (R 3.5.0)

3 Likes

It is basically this


except you need to put your configuration into the ~/.R/Makevars file rather than make/local of a CmdStan installation.

Running it locally on a 20-cores machine entails much less configuration. Just


with CXX14FLAGS += -DSTAN_THREADS in ~/.R/Makevars and setting the environmental variable STAN_NUM_THREADS at runtime.

Thanks! Just so I’m clear, setting the environmental variable (on a Mac), I can just do the following in R?:

Sys.setenv(“STAN_NUM_THREADS” = 4)

Or do I need to set this somehow before launching R itself? Is there a simple way to test whether it’s working?

You can do it after launching R but before calling stan or sampling. You will hear if it is working.

It only appears that one processor (not four) is working.

My Makevars in ~/.R/ looks like this:

CXX14FLAGS=-O3 -march=native -mtune=native
CXX14FLAGS += -arch x86_64 -ftemplate-depth-256
CXXFLAGS += -DSTAN_THREADS

Right when I load R, I type this:

Sys.setenv("STAN_NUM_THREADS" = 4)

Then I do:

library("rstan")
etc.

Then my sampling call is:

fitted_model <- sampling(my_model,
                         data = my_data,
                         warmup = 1000, iter = 2000,
                         thin = 2, refresh = 50,
                         open_progress = TRUE,
                         chains = 1)

Where my_model has about 300 shards.

But in Activity Monitor, I only see only R thread running (at 100%). I suspect I’m making a simple mistake here?

Just to (I think) answer my own question, I needed to add:

CXX14FLAGS += -DSTAN_THREADS

instead of

CXXFLAGS += -DSTAN_THREADS

Yes

After getting this working on my personal computer, I’ve been trying to get it on a high performance computer cluster. However, unlike on my personal computer, it doesn’t appear to be using any more cores than there are chains. So, right now, I have 4 chains on a 20-core system, and htop is showing me just 4 cores at 100% and the rest at 0%. This is on a CentOS machine. My .R/Makevars is:

CXX14 = icc -fPIC
CXX14FLAGS += -DSTAN_THREADS
CXX14FLAGS=-O3 -march=native -mtune=native

and I run this right after loading rstan:

Sys.setenv("STAN_NUM_THREADS" = 20)

Any ideas why it would work on my personal computer (a Mac), but not on the 20-core CentOS machine? Thanks!

Looks like I got it working. I think my mistake was having

CXX14FLAGS = -O3 -march=native -mtune=native

instead of

CXX14FLAGS += -O3 -march=native -mtune=native

My new Makevars, which works is:

CXX14 = icc
CXX14FLAGS = -DSTAN_THREADS
CXX14FLAGS += -O3 -march=native -mtune=native
CXX14FLAGS += -fPIC

Thanks again for all the help!