We started talking about this on an unrelated PR on github and I wanted to bring it to discourse.
We’re in the planning phases of trying to get the stanc3 compiler to emit models that can take better advantage of the GPU functionality @rok_cesnovar @stevebronder @tadej et al have been building into Stan Math. The most obviously annoying issue is that right now we are forced to copy data matrices to and from the GPU every leapfrog step even though it is impossible for them to change. Rok mentioned this is important to them,
Can we help in any way with the compiler support? This is probably the no.1 priority for our team, the other one being the matrix_cl. Will this be a flag on the Eigen matrix that will mean this Matrix’s contents are immutable?
I’m thinking the only thing we’d need from the Math library would be the GPU function overloads and a list of them that we can use during compilation. There are a few ideas for how we could use this, perhaps the simplest would be that whenever we see a call to one of these functions with a data or parameter argument, e.g. y ~ bernoulli_logit_glm(x, alpha, beta)
where x and y are data
, we can create a new variable x_opencl__ = to_opencl(x);
(and same for y) in the transformed data block and then replace the call with y_opencl__ ~ bernoulli_logit_glm(x_opencl__, alpha, beta)
). I think @Matthijs might have originally come up with this a few months ago. This seems super easy to do and I think it’s fairly general and solves our lowest hanging fruit… definitely open to other ideas!
Would it help if we were present for the meeting you mentioned on discourse?
I think the most helpful thing would be a regularized list of function signatures for GPU functions noting which arguments are overloaded for matrix_cl. That meeting is a little more about how to abstract this idea along with a few other ideas, but in retrospect I don’t think we need to solve that abstraction first.
There’s a few details that need to be filled in:
- Are current overloads all-or-nothing? Meaning if
x
andy
arematrix_cl
butalpha
andbeta
aren’t in the above example, what happens? I think if so we can just generateto_opencl
assignments for everything and then have dead code elimination remove any redundant ones. - Do any of the functions change their arguments?
- For functions that return a matrix, is it always a matrix_cl?
I think this plan actually sounds pretty feasible and robust even though it’s extremely simple. I should be able to spend hopefully 2 days a week on this - it’s a really killer feature for the new compiler and really showcases our GPU support so I’m really excited to work on it. If any of y’all are excited to dive into the new compiler, I’d be happy to pair with you on it! My friend has been bugging me to start a twitch stream of me programming; this might be a good first go at it :P