Overall design of GPU work?

maedoc · August 19, 2020, 9:46pm

I started reading through the implementation of the OpenCL work in the math library. I mostly expected some CL sources and interop code but then saw some sorts of kernel fusion are happening for amortizing memory access over multiple simple operations, which is pretty cool.

It occurred to me that this is the sort of thing that would be easier to do in the compiler (the new OCaml one is the only one I’ve read through a bit): it seems like the MIR or backend is the place where one has as lot of high level information and can fuse lots of arithmetic into a single fat kernel. It’d also be where one could speculatively generate forward gradient kernels for some functions.

I guess all of this has already been thought about but I am left wondering if there’s a roadmap since ultimately I’d like to be able to contribute, but I didn’t find anything or look in the right places. Thanks in advance!

stevebronder · August 20, 2020, 12:29am

Hey would love to have you! There are two pieces of docs

The Stan Math OpenCL Paper
The Kernel Fusion Docs as well as the other docs under the OpenCL module and “OpenCL for Parallel Computing” under the Parallelism tab.

I’m not sure if we are looking at doing the kernel fusion directly in the compiler. Our current plan is to use the new var_value<matrix_cl<double>> type to handle GPU computation in reverse mode

Topic		Replies	Views
Stanc3 Math lib opencl integration Developers	29	1242	September 23, 2019
GPU Update: what's up and where we are going Developers features , math	29	2579	November 12, 2018
9/22 noon EST: stanc3 pair programming / onboarding, focus on OpenCL/GPU integration Meetings	0	481	September 22, 2019
Paper on OpenCL kernel generator in Stan Math Publicity performance , papers	0	451	April 28, 2020
Stan on the GPU Project Proposals	16	8491	August 10, 2018

Overall design of GPU work?

Related topics