Integrating GPU support

Exactly. That’s why you don’t see 1000-fold speedups with 1000-GPU
configurations in most applications that aren’t just streaming
graphics calculations. There’s going to be a minimum operation
size where it becomes worth the quadratic transport cost (N x N
matrix ships N^2 double values) to parallelize a cubic operation
(multiplying two N x N matrices requires N^3 operations). And
I’m sure this will all be CPU/GPU/memory and motherboard architecture
dependent.

For better or worse, we have lots of situations where massive
matrix operations are involved. I also don’t know how much it’s
possible to push tensor operations, like doing M matrix multiplications.

Stan’s compatible with Eigen 3.3 now. Eigen 3.3 was a massive
change to the underlying template expressions and traits metaprograms.
So it should be easy to get started trying this with CmdStan.

We’re waiting on RcppEigen to move to 3.3 before we throw
away 3.2.x compatibility.

Another way to try to play with parallelism for Stan would
be to build a TensorFlow node out of Stan. That’s a little
different than what Edward’s trying to do. I believe TensorFlow’s
architecture involves nodes that compute functions with
Jacobians. Stan’s very efficient for functions on R^N -> R^M
if M << N. If M >> N, then forward-mode autodiff’s probably
going to be faster. I believe they only have a limited
(compared to the number of operations available in Stan math)
forward mode library built into TensorFlow. Then setting up
something where we could distribution pieces of the density
and gradient calculations and recombine them, would be very
useful. For example, when we do PK/PD models, we solve ODEs
for a lot of patients in a way that is embarassingly parallelizable
and the single bottleneck for the whole model. So you should
be able to speed these models up pretty well (they have a
high computation time to communication ratio).

  • Bob
1 Like

It seems like Eigen 3.3 adds the capability to use Eigen matrices from inside CUDA kernels but does not provide a mechanism for running matrix operations on GPUs. There is also an unsupported Tensor API (apparently used in Tensorflow) but it only supports basic operations. I think the option with highest potential impact would be to add MAGMA as an eigen backend, which someone started doing a few years ago with good results.

Curious if anyone has thoughts here.

1 Like

ViennaCL is another possibility.

1 Like

I like that ViennaCL has an MIT license. That’s
nice because it’s compatible with GPL and BSD.

  • Bob
1 Like