Hi all, I’m thinking of using STAN for a GP problem, but I would require GPU speed ups. Does STAN currently use GPU for computing multivariate normal densities/Cholesky decompositions? I think last year, those computations were still on cpu.
Whether to use the GPU for linear algebra operations is configurable when the model is compiled—how do that may vary by interface. I don’t think we have a parallel option for the normal densities themselves, but if you’re willing to build it out of parts, I believe the Cholesky decomposition can be run on GPU. Having said that, we’re running double precision on the GPU and the autodiff overhead in memory for a large covariance matrix is likely high. We have some ideas on how to speed up GPs and their covariance matrix construction, but they’re not in Stan yet (the idea will be to use comprehensions to create the covariance matrix and then use struct of arrays for memory locality and matrix efficiency).
Now I see why this doc is so confusing. We’re using OpenCL for some of the distribution code. There’s also GPU code under some of our matrix functions that aren’t documented here. I’m not sure where the doc is for those
There’s a cryptic comment from @rok_cesnovar here on the forums a couple of years ago:
Looks like it used to exist but was removed? @stevebronder or @rok_cesnovar should know the current status. I’m not even sure where the doc is to turn on GPU execution of matrix functions.