For MPI we have now a proposal to manage immutable data in a distributed way. From Bob’s blog post I concluded that this could also be of interest to GPU computing if I got his comments right. Do we need to consider this in some way now in the design?
I can well imagine, that for GPs (for example), it would be quite attractive to copy the immutable data to the GPU just once and then reuse it for each iteration. That may give another performance bump.
I think we can do the same design principle, yes. So in essence:
the gpu/mpi function gets called with the immutable data and a uid
if the uid has not yet been seen, then the immutable data is distributed for MPI/sent to the GPU
the distribution ensures that the uid will be recognized next time the function is called
after the first call of the function we will assume that same uid is equivalent to same data
We will only end up having more and more singletons floating around in our code-base. I think this GPU stuff can even be nested in MPI calls when doing like that.
A GPU version of cov_exp_quad_cholesky should be super handy for GPs, I think.
That’s exactly why they’re starting where they’re starting—with Cholesky factorization. It’s O(N^2) data but O(N^3) computations. Hopefully we’ll be able to get to the point where we can pass a data matrix (N^2) once and then calculate a matrix-vector product efficiently using the GPU.
Yup. In fact, for cov_exp_quad_cholesky there is not an urgent need to transfer the data to the GPU as the you need to process N data items to define the N^2 matrix. Of course, transferring the N data items just once is even better.
I am going to remind you on that during a stan meeting once i merged and ran those two branches… I agree that those two techniques have separatley a huge potential and taken together they make Stan a new beast. However, you need serious hardware to get this going and a lot of time to get it to compile I guess.
Sure, nobody’s going to get faster models on their notebooks. We designed Stan to solve hard problems, and this will really push the frontier of what we can solve!