Threading and mpi and tbb and gpu in cmdstan

linas · October 30, 2019, 2:53pm

Hi,

In order to parallelize sampling I initially used mpi. As I understand as long as you are not using more than 1 node threading is not less effective. Right?

There is a thread Benefits of parallelization with a threadpool of the Intel TBB where it is shown that TBB is more efficient than MPI. Is there any way to invoke different TBB methods as demonstrated in the thread?

I have compiled cmdstan with the following flags in make/local
STAN_OPENCL=true
OPENCL_DEVICE_ID=0
OPENCL_PLATFORM_ID=0

Does it mean that cmdstan for sampling uses GPU exclusively for Cholesky and CPU all other times? Does it use GPU for matrix operations?

Thanks

rok_cesnovar · October 30, 2019, 4:25pm

Hi Linas,

If you are using map_rect, which I presume you are since you are using MPI, the only thing you need to do in order to use TBB for threading is:

make sure you are using Cmdstan 2.21
turn on threading by adding CXXFLAGS += -DSTAN_THREADS to make/local (see https://github.com/stan-dev/math/wiki/Threading-Support for more)

Regarding OpenCL support:

If you are using Cmdstan 2.21 the OpenCL the following functions will be run on the GPU if the input sizeis large enough:

cholesky_decompose
matrix mutliplication
mdivide_left_tri_low
mdivide_right_tri_low
gp_cov_exp_quad

The plan is to support most Stan functions with GPUs for the 2.22 release but that doesnt help you here.

wds15 · October 30, 2019, 5:02pm

At the moment the TBB is used to parallelise map_rect and I would expect that MPI will give you the same performance. The thread you refer to doesn’t even compare against MPI as I can see. From my experience, the TBB map_rect is now just as fast as MPI. So threading was lacking in speed behind MPI, but that slowness of threading is now gone with the use of the TBB.

linas · October 30, 2019, 7:43pm

Thanks for the help. Should I expect a difference between threading and MPI (in the favor of MPI)?

wds15 · October 30, 2019, 9:24pm

No

Topic		Replies	Views
Threading, MPI, and TBB for users General performance	6	1326	December 16, 2019
threading/MPI CmdStan	7	695	November 6, 2019
Benefits of parallelization with a threadpool of the Intel TBB Developers	39	5399	October 25, 2019
MPI Stan + cmdstan General	8	1182	June 15, 2018
Parallelization CmdStan	7	88	March 20, 2025

Threading and mpi and tbb and gpu in cmdstan

Related topics