I wanted to give feedback on my use of cmdstan 2.18.0 with MPI on a relatively big computer cluster. I also detail the solutions to some problems I encountered during installation, which might be of use to other people like me (not that confortable with this kind of stuff). The computer cluster I have access to belongs to the University of Bern (https://ubelix.unibe.ch/) and runs on CentOS 7.5.1804.
Installation
Installation was a bit difficult, even with the help of the cluster admins. It’s a bit over my head but there were at first problems with conflicting versions of Boost, and then a missing header file (pyconfig.h
) that was eventually solved by installing the python-devel package. So overall what worked for me was:
- loading the module with the right version of Boost
- installing python-devel
- creating a file named
cmdstan-2.18.0/make/local
with the following lines:
CC=mpicxx
STAN_MPI=true
- running
make build -j24
Compiling models
The installed cmdstan version was then able to compile example models in “examples/bernouilli”. I then wated to make some tests using Daniel Lee’s ODE examples (https://github.com/generable/stan-ode-workshop). At first I could not compile models that include map_rect()
, because of undefined reference to 'pthread_create'
. After a bit of googling, I managed to find a solution by adding LDFLAGS= -pthread -lpthread
to make/local.
Time gains
I tried to sample one chain of 2000 iterations from Daniel’s SHO model including parallelization with a single node (24 CPUs, 64GB ram). Sampling took {405,422,424,446,427} seconds without MPI (sho_fit_multiple.stan), and with MPI (sho_fit_multiple_parallel.stan) that time was reduced to {322,330,309,328,335} seconds. So on average a reduction of 24%.
It is a bit less than I expected, but I’m not completely sure whether I’m using MPI at its maximum here, or if there is a way to monitor CPU usage. My next step is to implement map_rect()
in my own model that has more data, more complex ODEs and a hierarchical structure. Maybe the difference will be more important in that case. If there is any interest I will continue to post the results here.
Anyway I’d like to thank the Stan developers for implementing such cutting-edge methods!