Speaking of too clever, tbh idt it would be that hard to have a pre-processor for the kernel code. We have like 90% of the stuff there to manage that.
Let’s say we wanted to do this for add
because it’s simple
__kernel void add(__global double *C, __global double *A,
__global double *B, int rows, int cols) {
const int i = get_global_id(0);
const int j = get_global_id(1);
if (i < rows && j < cols) {
C(i, j) = A(i, j) + B(i, j);
}
}
);
Let’s add templating
template <typename T1, typename T2, typename T3>
__kernel void add(__global T1 *C, __global T2 *A, __global T3 *B,
int rows, int cols) {
const int i = get_global_id(0);
const int j = get_global_id(1);
if (i < rows && j < cols) {
C(i, j) = A(i, j) + B(i, j);
}
}
// \cond
);
Currently, we can’t really do anything with the above because we setup the kernel to compile at the construction of the kernel_cl
struct. But we really want to be able to call code like the below and have a kernel generated that’s for those types.
matrix_cl<int> A;
matrix_cl<double> B;
// Return will be the type A and B should be promoted/preferred to
matrix_cl<promote<A::type, B::type>> C;
// Add an int and a double
opencl_kernel::add(C, A, B, A.rows())
The above should call the kernel
// However we want to mangle the name
__kernel void double_int_double_add(__global double *C, __global int *A,
__global double *B, int rows, int cols) {
const int i = get_global_id(0);
const int j = get_global_id(1);
if (i < rows && j < cols) {
C(i, j) = A(i, j) + B(i, j);
}
}
);
And we can do this! We’ve been treating the fact that kernels have to come in as strings as more of a bug than a feature, but idt it would be hard to write a little preprocessor over the strings to get what we want.
The step-ish things that need to happen for the above to compile and execute are
- Type for
matrix_cl<T>
need to be passed to the OpenCL JIT compiler
- We can just store the type in the matrix_cl so it can by accessed like
A::type
- the kernel_cl constructor does not compile the kernel. It is only given the kernel string and other compile options.
- Compilation can happen actually JIT at the call site.
- Make a mangled signature
So for the above we need to parse the kernel string for a template
line above a __kernel
signature. Cool, we see there is a T1, T2, and T3. We also see the matrix_cl’s come in with types double
, int
, and double
. So we will start by mangling the name like
__kernel void __T1_template___T2_template___T3_template_add(__global __T1_template *C,
__global __T2_template *A,
__global __T3_template *B,
int rows, int cols) {
const int i = get_global_id(0);
const int j = get_global_id(1);
if (i < rows && j < cols) {
C(i, j) = A(i, j) + B(i, j);
}
}
);
Now we know our templates, we have the signature right, and the types each should be. So we can do a loop over each template type to clean everything up. Essentially a gsub
replacing __T1_template
with double
etc.
// However we want to mangle the name
__kernel void double_int_double_add(__global double *C, __global int *A,
__global double *B, int rows, int cols) {
const int i = get_global_id(0);
const int j = get_global_id(1);
if (i < rows && j < cols) {
C(i, j) = A(i, j) + B(i, j);
}
}
);
Cool! Done deal! Now we need to make sure we don’t compile that kernel twice
- Compile that kernel and store it in a map inside of
opencl_context
(key = signature and value = kernel)
- When we call a kernel, check the map for
<types>_kernel_name
. If it exists just make a copy of it and pass it to that kernel_cl
. Then pass that kernel the kernel args, and execute it. Else compile it first.
I think that makes sense. does that make sense? “C++ without classes” I guess