GPU support in Rstan?

JulianK · May 1, 2019, 8:24am

Hello! Curious as to when rstan 2.19 is likely be out, which I presume will have GPU support :)

Thanks!

bgoodri · May 1, 2019, 3:10pm

It can be done with the GitHub versions of StanHeaders and rstan currently. I’m trying to sort out some remaining issues today so that it can be uploaded to CRAN.

stevebronder · May 1, 2019, 3:47pm

Hopefully users should be able to follow the install instructions for OpenCL available here. Once OpenCL is installed it should just be adding the flags -DSTAN_OPENCL,
OPENCL_DEVICE_ID, OPENCL_PLATFORM_ID to Makevars following the info to set those in the doc above.

stemangiola · May 3, 2019, 3:39am

Is there by any chance any documentation on what operations will be sped up?

I see these ones

cholesky_decompose
inverse
diagonal_multiply

Others for example (?):

simple matrix multiplications
transpose a matrix
simple matrix manipulation (e.g., to vector)
simple element wise operations(matrix .* matrix; , square(matrix))
…

Thanks

rok_cesnovar · May 3, 2019, 8:02am

As of 2.19 (or 2.19.1) only cholesky_decompose speedups are exposed to the Stan users.

There are some other functions that are sped up under the hood (lower/upper triangular inverses, various forms of matrix multiplication, etc) but those are currently only used inside the cholesky_decompose implementation. They are now being integrated in the Stan user exposed mdivide_left_tri, multiply, etc. functions.

We should see speedups for matrix multiply, mdivide_left_tri and some GLMs speedups exposed to the user in 2.20. We are working on 2 larger OpenCL backend features (caching and async/out of order execution) and then we should be able to flush out those.

Transposing, element wise operations and such are a different story. We are currently only looking to speedup individual Stan functions where the input and output are both in the CPUs global memory for each iteration. The speedups of using a GPU for transposing a matrix is not large enough (too simple operations) to overcome the added overhead of transferring data to and from the GPU.

We will be able to provide speedups even for these simple function for constant data (matrices of doubles in the backend) but not for variables (matrices of stan::math::var in the backend Stan Math). For variables we might be able to do this with Stan3 but that is still some time away.

andre.pfeuffer · May 3, 2019, 8:20am

Any plans for columns_dot_product, log_sum_exp?

rok_cesnovar · May 3, 2019, 8:32am

Those are both on the short list to be evaluated next, yes. Once caching and async(out of order execution) are finished we might do another post to get some user feedback on their bottlenecks.

I forgot to mention in the previous post that gp_exp_quad_cov (the Stan function is cov_exp_quad I think) is also in the works and will be ready for 2.20.

wds15 · May 3, 2019, 9:18am

Shouldn’t we also go for a gp_exp_quad_cholesky??

I mean we usually need the the cholesky of the GP kernel. With a gp_exp_quad_cholesky the communication cost is reduced and all expensive steps are done in a single go on the GPU. Or is this planned to be handled in another approach (caching/async).

rok_cesnovar · May 3, 2019, 9:50am

Yeah, that seems like a prime candidate to get huge speedups since the input is basically a vector, you do a cov_exp_quad and a cholesky on the GPU and return the matrix. But we need to add that C++ function to Stan Math first, or is that already happening?

Caching wont help us there.

wds15 · May 3, 2019, 12:22pm

No. Not yet there… but I think this is obvious that we want this… unless you find a good way which keeps things modular, but does the magic in a single go anyways (expression templates?).

Topic		Replies	Views
GPU supported in rstan 2.19.x? General	3	3066	July 31, 2019
GPU functions in rstan brms rstan	6	2438	March 18, 2020
Taking advantage of both sparse matrices and GPUs Algorithms	7	831	September 19, 2024
How do I use GPUs with CmdStan? Developers	11	1186	September 11, 2020
Stan on the GPU Project Proposals	16	8514	August 10, 2018

GPU support in Rstan?

Related topics