OpenCL 2.0

I don’t think adoption is the problem but moreso that OpenCL 2.0 has a lot of new features that are taking time to do correctly.

Not really, but I can summarize some of the notes I’ve found

So, yeah I mean it’s actually just Nvidia that’s behind.

For me its Shared Virtual Memory (SVM) which let’s the host and device operate on the same virtual memory. You can read in some detail about it here. The main benefits are:

  1. If you have an integrated CPU+GPU combo on your computer you don’t pay memory transfer prices.
  2. You can pass abstract types over to the device. Right now we have to pass over linear contiguous memory, but with SVM we can actually pass over something like, a whole chunk of the expression tree. So if we had kernels for functions and their derivatives for whole a whole section of the expression tree then we can move that whole piece over and do everything over on the GPU**
  3. The host and device can use atomic operations on the SVM without transfer if fine-grain SVM is supported on the device. The extra good thing about this is that if we can figure out a way to coerce SVM and Eigen to play nicely then we can make the SVM the data in the Eigen matrices. So we can do GPU stuff, ‘pass’ that data back to Eigen, then Eigen can go about doing it’s normal operations. When we want to go do stuff on the GPU it will already know about the changes so we pay way way less transfer costs.

** This is actually better than cuda, where the host needs to keep queuing new functions. With OpenCL we can actually have the device call new kernels itself.

Overall 2.0 is v v good, though tbh Rok and I are a bit skeptical because this all sounds too good to be true. I believe Rok is planning on taking the SVM stuff for a test drive soon.