OpenCL + MPI

Bob, the question here revolves around the OpenCL async API which is relevant not for multiple GPU setups but for multi-process single-GPU setups. I was asking a more nuanced question than the one you answered: Given that GPUs performance scales directly and only with vectorization (i.e. sending more data to the GPU at once to be processed together), it often doesn’t make sense to have multiple processes using a single GPU if you could instead coordinate and have all of the data put on the GPU together (like we can).