GPU integration for rstan 19.2


Could someone direct me to any documentation on what softwares to install on Windows 10 to leverage the gpu integraion in rstan 19.2?

I am using NVIDIA Quadro M1200 graphics card.



Found it at:


I had similar questions. I can’t directly help you with rstan on Windows, but here is the closest thing I’ve found to useful information: After a fun couple of days getting cuda libraries and toolsets installed and working on a PC running Fedora 30 with a lowish end Nvidia graphics card I was able to run the C++ code and R scripts in the link on a freshly installed version of CmdStan.

I have one Stan model that I thought was a candidate for improvement since it’s a Gaussian process model in 2D with code mostly lifted directly from various case studies/tutorials, but it shows no difference in performance at all. There was a comment somewhere that there’s a threshold model size for using the GPU, and those sample c++ programs manipulate that to force either GPU or CPU utilization, but I have no idea how to set that from any of the higher level interfaces.

I also tried adding the recommended compiler directives in my ~/.R/Makevars file, but all I’m getting is about a 10% performance penalty.

I have limited knowledge on the Rstan installation process so I cant offer much assistance there. I am actually not quite sure how for along is the integration of cmdstan’s GPU features in rstan. Will look into it.

For now, please refer to the cmdstan installation instructions. See the PDF files under either 2.19.* or the 2.20 release. Inside the manual is also the link to the wiki @Raghu_Naik has posted above.

Do note that using 2.19.* you should only see speedups if your model uses cholesky_decompose. Optimizations for mdivide_left_tri was added in 2.20. Support for other optimizations is still in the reviewing process and on track for 2.21.

While this link does offer some general installation instructions, this repository was created for replicating the results for a paper we submitted and is overkill. The R scripts actually still call cmdstan.

If you follow the installation instructions in the cmdstan release guides you should be able to run the cmdstan test for OpenCL. The test is executed using
python src/test/interface/opencl_test.cpp
./ src/test/interface/opencl_test.cpp.

If these tests run successfully you are good to go. If these instructions did not suffice, please let me know. They were tested with NVIDIA + Win/Ubuntu/CentOS and AMD + Win/Ubuntu.

The user does not need to set any limits. Currently, for cholesky_decompose we switch to the GPU if N>1250. For mdivide_left_tri the limit is N>100, where N represents the input square matrix size.

My impression is that you just install rstan from CRAN and put the necessary things into ~/.R/Makevars to tell it to use the GPU. But I haven’t tried it yet.

OK, thanks for the hint. I have a Stan model that I thought might be a candidate for speedup on a GPU – it’s a GP model with code largely lifted from various Stan tutorials and therefore uses both cholesky decomposition and m_divide_left. But the particular data set I was testing with only has N=371.

So, just for fun I drilled pretty deep into the stan_math source, found the lines where these are set (with a comment promising TODO something different in the future), changed them to 0, rebuilt cmdstan, rebuilt my model, and…

The gradient evaluation time and total execution time doubled using OpenCL. So for my particular GPU/CPU/data combination there’s no advantage to using the GPU. That exercise satisfied my curiosity for now.

Yeah, for N<1000 we dont expect speedups for cholesky_decompose. For some other functions like mdivide_left_tri you should see speedups for N>100 with a decent GPU. But that is not available in 2.19