I started reading through the implementation of the OpenCL work in the math library. I mostly expected some CL sources and interop code but then saw some sorts of kernel fusion are happening for amortizing memory access over multiple simple operations, which is pretty cool.
It occurred to me that this is the sort of thing that would be easier to do in the compiler (the new OCaml one is the only one I’ve read through a bit): it seems like the MIR or backend is the place where one has as lot of high level information and can fuse lots of arithmetic into a single fat kernel. It’d also be where one could speculatively generate forward gradient kernels for some functions.
I guess all of this has already been thought about but I am left wondering if there’s a roadmap since ultimately I’d like to be able to contribute, but I didn’t find anything or look in the right places. Thanks in advance!