OpenCL Async performance

seantalts · April 29, 2019, 9:13pm

I’ve been reading a little more about OpenCL’s async facilities and they suggest using multiple command queues for the best performance, but I think we’re only using 1 (and have it hardcoded in a global singleton). Does anyone know how much additional performance we’d get from using multiple queues or what that implementation might look like? Tagging usual GPU suspects @rok_cesnovar and @stevebronder :)

references:

stevebronder · April 29, 2019, 9:51pm

Yeah this is a thing I see pretty frequently. I think that advice is for older devices that do not support async within a queue. That’s why the above and stack overflow Qs mentioning this are from 8-10 years ago.

Async works within a queue, so if the device doesn’t support it (like in older systems) the way to get around not having actual async was by having queues for reading and queues for writing. We could due this, but then it’s just more queue management overhead.

idk if Rok has more thoughts but that’s my general understanding

rok_cesnovar · April 30, 2019, 7:26am

That is my understanding also yes. This is a workaround if async is not supported which was more common back then.

Topic		Replies	Views
OpenCL + MPI Developers	10	1325	October 30, 2018
OpenCL async API clarification Developers	5	607	April 26, 2019
OpenCL & threading supported at the same time? Developers	9	1430	March 11, 2022
OpenCL with discrete distributions Modeling	6	175	January 24, 2025
Overall design of GPU work? Developers	1	402	August 20, 2020

OpenCL Async performance

Related topics