Won’t we also need a dependency on the OpenCL GPU library?
We’ve been talking to some mxnet developers about using their sparse matrix libraries if they wind up implementing them with derivatives in a way that won’t lead to a dependency on all of mxnet. They went with CUDA only, claiming that OpenCL didn’t have the performance to be worth coding for. I have no idea what the reality is here. Mxnet is nice for us in that it supports double-precision arithmetic.