Integrating GPU support

Bob_Carpenter · January 28, 2017, 4:00pm

Exactly. That’s why you don’t see 1000-fold speedups with 1000-GPU
configurations in most applications that aren’t just streaming
graphics calculations. There’s going to be a minimum operation
size where it becomes worth the quadratic transport cost (N x N
matrix ships N^2 double values) to parallelize a cubic operation
(multiplying two N x N matrices requires N^3 operations). And
I’m sure this will all be CPU/GPU/memory and motherboard architecture
dependent.

For better or worse, we have lots of situations where massive
matrix operations are involved. I also don’t know how much it’s
possible to push tensor operations, like doing M matrix multiplications.

Stan’s compatible with Eigen 3.3 now. Eigen 3.3 was a massive
change to the underlying template expressions and traits metaprograms.
So it should be easy to get started trying this with CmdStan.

We’re waiting on RcppEigen to move to 3.3 before we throw
away 3.2.x compatibility.

Another way to try to play with parallelism for Stan would
be to build a TensorFlow node out of Stan. That’s a little
different than what Edward’s trying to do. I believe TensorFlow’s
architecture involves nodes that compute functions with
Jacobians. Stan’s very efficient for functions on R^N -> R^M
if M << N. If M >> N, then forward-mode autodiff’s probably
going to be faster. I believe they only have a limited
(compared to the number of operations available in Stan math)
forward mode library built into TensorFlow. Then setting up
something where we could distribution pieces of the density
and gradient calculations and recombine them, would be very
useful. For example, when we do PK/PD models, we solve ODEs
for a lot of patients in a way that is embarassingly parallelizable
and the single bottleneck for the whole model. So you should
be able to speed these models up pretty well (they have a
high computation time to communication ratio).

Bob

seantalts · January 30, 2017, 7:16pm

It seems like Eigen 3.3 adds the capability to use Eigen matrices from inside CUDA kernels but does not provide a mechanism for running matrix operations on GPUs. There is also an unsupported Tensor API (apparently used in Tensorflow) but it only supports basic operations. I think the option with highest potential impact would be to add MAGMA as an eigen backend, which someone started doing a few years ago with good results.

Curious if anyone has thoughts here.

bgoodri · January 31, 2017, 4:56am

ViennaCL is another possibility.

Bob_Carpenter · January 31, 2017, 6:52pm

I like that ViennaCL has an MIT license. That’s
nice because it’s compatible with GPL and BSD.

Bob

Topic		Replies	Views
Stanc3 Math lib opencl integration Developers	29	1243	September 23, 2019
Stan on the GPU Project Proposals	16	8497	August 10, 2018
Taking advantage of both sparse matrices and GPUs Algorithms	7	830	September 19, 2024
How do I use GPUs with CmdStan? Developers	11	1183	September 11, 2020
Does STAN with opencl use GPU for the Cholesky decomposition for computing multivariate normal density? General	3	409	February 21, 2024

Integrating GPU support

Related topics