gpuR? (dev question)

Bob_Carpenter · October 21, 2017, 2:13pm

Mike Lawrence suggested we might be able to make use of either the gpuR package or reach out to the developers for help with GPU integration for Stan.

That might help with R installs—CRAN probably isn’t going to let us bundle up our GPU dependences unless they’re very small. So we might be able ot use it the same way we use BH and RcppEigen.

P.S. I’m not going to post anything to the “developers” tag until it’s open to user comments.

bgoodri · October 21, 2017, 4:06pm

I think that is going to be fine. Same thing with the Rmpi package probably.

bgoodri · October 21, 2017, 5:00pm

Actually, I am not sure with GPUs. We certainly could go that way if we were relying on ViennaCL. But if @rok_cesnovar is building a non-ViennaCL GPU library, then we will just have to put that into StanHeaders like we do with CVODES (there is an R package that bundles CVODE without the S). Either way, not a big deal.

rok_cesnovar · October 21, 2017, 5:23pm

The amount of additional code for GPU support is not overwhelming and I do not think this will be a problem.

For instance:
In terms of files the core of the GPU lib is in 10 files. After that if we take the example of Cholesky, there are 2 additional files in the stan math /rev/mat/fun and /prim/mat/fun.

Bob_Carpenter · November 1, 2017, 6:51pm

Won’t we also need a dependency on the OpenCL GPU library?

We’ve been talking to some mxnet developers about using their sparse matrix libraries if they wind up implementing them with derivatives in a way that won’t lead to a dependency on all of mxnet. They went with CUDA only, claiming that OpenCL didn’t have the performance to be worth coding for. I have no idea what the reality is here. Mxnet is nice for us in that it supports double-precision arithmetic.

rok_cesnovar · November 1, 2017, 8:15pm

The OpenCL runtime is installed with the GPU driver (if it is not really really old). For compiling we only need to include the header files: https://github.com/KhronosGroup/OpenCL-Headers/tree/master/opencl22/CL

But the same goes if you decide to go with any oither library. No matter what they do, the underlying library is either CUDA or OpenCL. To run CUDA you will also need a supported driver (which is not a problem, same as with OpenCL) and the CUDA Toolkit with the nvcc compiler in order to compile the GPU code.

As far as performace goes, CUDA does have some performance advantages on NVIDIA GPUs when fine tuning for specific GPU architectures and NVIDIA has put some focus on better support for deep learning. But I would oppose the statement that is not worth coding in OpenCL, as the performance difference is not that big. Various application like Photoshop, GIMP, Autodesk Maya, LibreOffice, etc. all use OpenCL to speedup their applications.

If the goal is to run Stan on dedicated computers with (multiple) NVIDIA GPUs, than using CUDA (libraries that use CUDA) would probably be the way to go. If the goal is to run Stan faster on a wider range of desktop computers with AMD/NVIDIA/Intel GPUs, than using OpenCL would be the way to go. OpenCL also supports other accelerators like Xeon Phi and is also targeting FPGA.

As far as double-precison arithmetic goes I am a bit confused as both OpenCL and CUDA support double-precision. We were discussing single-precision only in the context of even bigger performance gains, as regular GPUs tend to have twice as many single-precision computing units as double-precison.

In the case that you decide to go with mxnet, can we still count on your support if we would have any questions regarding the base stan math code and combining it with GPU code?

Bob_Carpenter · November 1, 2017, 8:24pm

Sorry for the confusion.

First, I was just verifying that we’d need external GPU libraries, too. Those external libs can be a huge pain in R, which won’t let us bundle them with RStan due to size unless they’re very small. Having multiple such external libs makes installation painful, and it’s already a big enough pain point with Stan.

For 32-bit vs. 64-bit, Tensorflow is concentrating primarily on 32-bit arithmetic from what I can see of Edward and elsewhere and what I heard from asking around. Mxnet is concentrating primarily on 64-bit stuff, which is why it’s relevant, as we want to be able to do things like sparse Cholesky factorization efficiently which is challenging if not impossible with single precision.

We’re only talking to the mxnet folks so far about adding sparse matrix functionality. It sounds like it’ll be orthogonal to whatever we do with you guys. But if they use CUDA and you use OpenCL, it’ll add yet another dependency and probably restrict us to using either sparse or dense operations and not mixing them if that’s even possible.

Everyone keeps telling us all these dependencies are simple, but they’ve proven to be a huge pain for us to manage through R and Python. I don’t know that we’ll even try to get GPUs working through anythnig other than our CmdStan interface on Linux.

Hopefully everyone like you who knows more about this than me will be in on any decision to consolidate efforts, but that’s a long way off.

No promises on any long-term support. We just don’t have the staff to make those kinds of commitments. You’re going to be the expert in Stan math and GPU code, so I don’t know what you’re expecting from the other Stan devs here We will continue to answer questions about the math lib for everyone.

chriselrod · November 1, 2017, 8:32pm

I bought AMD GPUs because of distaste for proprietary CUDA that only runs on hardware for one manufacturer, and more broadly in how restrictive developing a code base in that language becomes.

Although there are tools that make porting CUDA easier, eg HIP.

As for deep learning, at the consumer level AMD Vega GPUs offer excellent Float16 performance. I believe NVIDIA only offers that for their professional GPUs that cost several times more per teraflop. Although I’m more interested in Float32, where both do well.

As for 64-bit – isn’t it only an extremely small subset of GPUs (ie, NVIDIA’s Tesla series) that perform well at all?

rok_cesnovar · November 1, 2017, 8:34pm

I was refering to questions regarding the math lib. Sorry for the confusion. We competely understand and do not expect any other commitment.

bgoodri · November 1, 2017, 8:37pm

This is simple. For example, the gpuR package
https://cran.r-project.org/web/packages/gpuR/index.html
just says you are required to have

C++11 (supporting at least std=c++0x), OpenCL shared library (provided by an SDK such as AMD/NVIDIA) and OpenCL headers including the C++ header file (provided by Khronos if not by SDK)

If the user doesn’t have OpenCL installed locally, they get a compiler error.

sakrejda · November 1, 2017, 8:38pm

So we’re willing to write code in rstan that deals with sometimes having the dependency and sometimes not?

bgoodri · November 1, 2017, 8:39pm

For Stan programs, I think so. It is like saying you have to have a C++ compiler.

It would be different for something needed to compile stanc.

Bob_Carpenter · November 1, 2017, 11:02pm

It may be, but that’s what we need. The kinds of matrix calculations we’re doing are barely stable with 64 bits, and aren’t stable enough with 32 bits.

Bob_Carpenter · November 1, 2017, 11:03pm

Thanks for the feedback. Certainly my inclination as well.

Topic		Replies	Views
GPU integration for rstan 19.2 General	6	2204	July 20, 2019
GPU supported in rstan 2.19.x? General	3	3070	July 31, 2019
Setting up GPU for RStan on Windows 10 Developers	6	1924	July 18, 2022
Stan on the GPU Project Proposals	16	8516	August 10, 2018
GPU devices to use with Stan using OpenCL General	3	630	January 4, 2023

gpuR? (dev question)

Related topics