Feedback on installing and using MPI on a computer cluster

jriou · September 14, 2018, 2:39pm

I wanted to give feedback on my use of cmdstan 2.18.0 with MPI on a relatively big computer cluster. I also detail the solutions to some problems I encountered during installation, which might be of use to other people like me (not that confortable with this kind of stuff). The computer cluster I have access to belongs to the University of Bern (https://ubelix.unibe.ch/) and runs on CentOS 7.5.1804.

Installation

Installation was a bit difficult, even with the help of the cluster admins. It’s a bit over my head but there were at first problems with conflicting versions of Boost, and then a missing header file (pyconfig.h) that was eventually solved by installing the python-devel package. So overall what worked for me was:

loading the module with the right version of Boost
installing python-devel
creating a file named cmdstan-2.18.0/make/local with the following lines:
CC=mpicxx
STAN_MPI=true
running make build -j24

Compiling models

The installed cmdstan version was then able to compile example models in “examples/bernouilli”. I then wated to make some tests using Daniel Lee’s ODE examples (https://github.com/generable/stan-ode-workshop). At first I could not compile models that include map_rect(), because of undefined reference to 'pthread_create'. After a bit of googling, I managed to find a solution by adding LDFLAGS= -pthread -lpthread to make/local.

Time gains

I tried to sample one chain of 2000 iterations from Daniel’s SHO model including parallelization with a single node (24 CPUs, 64GB ram). Sampling took {405,422,424,446,427} seconds without MPI (sho_fit_multiple.stan), and with MPI (sho_fit_multiple_parallel.stan) that time was reduced to {322,330,309,328,335} seconds. So on average a reduction of 24%.

It is a bit less than I expected, but I’m not completely sure whether I’m using MPI at its maximum here, or if there is a way to monitor CPU usage. My next step is to implement map_rect() in my own model that has more data, more complex ODEs and a hierarchical structure. Maybe the difference will be more important in that case. If there is any interest I will continue to post the results here.

Anyway I’d like to thank the Stan developers for implementing such cutting-edge methods!

jsteinhart · October 30, 2018, 1:23pm

Thanks very much for sharing this. I had been trying to get this running on an up-to-date Ubuntu 18.04 system and encountering the following error (pasted here in case it helps anyone else with the problem find this thread):

/usr/bin/ld: stan/lib/stan_math/lib/gtest_1.7.0/src/gtest-all.o: undefined reference to symbol 'pthread_key_delete@@GLIBC_2.2.5'

The pthread-related LDFLAGS were the missing link. For completeness, my working make/local looks like the following:

STAN_MPI=true
CXX=mpicxx
LDFLAGS += -pthread -lpthread

Same as yours, except CXX instead of CC. I saw this was recently changed on the MPI wiki page, and this issue suggests that (some? all?) bits of Stan now use CXX instead (although neither the history on that issue nor a Discourse search make it clear that that is universally true so beware… I’m just a C++ fearing, cargo-cult monkey, bashing my keyboard until things mysteriously work). I am also not sure if the MPI wiki page should be updated with this LDFLAGS, or if this is just an issue specific to some systems.

Anyway, thanks again for sharing your solution. And to the ever-amazing Stan dev team for bringing this feature. Ever since I heard @Bob_Carpenter talk about MPI-powered map_rect in Helsinki, I’ve been dying to get this working. It is a real game-changer!

jriou · October 31, 2018, 10:54am

I’m glad I could be useful!

Topic		Replies	Views
Cmdstan 2.18 MPI Modeling	36	3085	September 12, 2018
Running cmdstanr in parallel on computing cluster General	6	976	December 9, 2022
MPI Stan + cmdstan General	8	1182	June 15, 2018
Correct way to use MPI with cmdstanpy Modeling	9	709	August 29, 2020
MPI + cmdstan Developers	0	413	May 8, 2018

Feedback on installing and using MPI on a computer cluster

Installation

Compiling models

Time gains

Related topics