OpenCL async API clarification

seantalts · April 24, 2019, 8:56pm

Hey all,

@stevebronder and I have been trying to figure out part of the OpenCL 1.2 async API for a little while now. Many of the commands, such as enqueueWriteBuffer, take a pointer to a collection of events that the command should wait for,

cl_int cl::CommandQueue::enqueueWriteBuffer(
  const Buffer& buffer,
  cl_bool blocking_write,
  ::size_t offset,
  ::size_t size,
  const void * ptr,
  const VECTOR_CLASS<Event> * events = NULL,
  Event * event = NULL)

And I’m a little worried about that pointer because we can’t find in the spec anywhere exactly what they do with that (and the underlying implementations don’t seem to be open source). We would love it if they copied the events as soon as you pass them in, but they could technically try to access that pointer later, after the call to enqueueWriteBuffer has returned.

I wrote a little sample program that tries to stress this out and ran it with clang’s AddressSanitizer, and it came out clean (well, it says my OpenCL implementation has a variety of unrelated memory leaks but they don’t seem important).

Here’s that code, most of which is copied from some tutorial. The important case is on line 57 in schedWriteEvents, which creates a std::vectorcl::Event at local scope and passes that in to enqueueWriteBuffer. If they copy things immediately, this should be fine (and it seems to work with my local implementation of OpenCL), but obviously if it keeps that pointer past the return of this function and the subsequent destruction of the vector of events, then it would be accessing freed memory.

Anyone know if this kind of API is common? I suspect it’s just a relic of wrapping the underlying C API and it probably will do the correct thing because it’s difficult for me to imagine how we would be supposed to manage the lifetimes of those collections otherwise (I mean, it’s possible, but pretty nasty).

stevebronder · April 25, 2019, 2:11am

I was a bit bored so I started diving into this a bit deeper.

pocl is an open source implementation of OpenCL, I have links to the code source below

We would love it if they copied the events as soon as you pass them in, but they could technically try to access that pointer later, after the call to enqueueWriteBuffer has returned.

Turns out they do! Interesting parts start happening in the kernel calls here where the input event list is copied (shallow) over to a new event list

Also here is the clReleaseEvent code with the macros it uses defined here. So we can see it just decrements the counter unless the reference count is 0

rok_cesnovar · April 25, 2019, 5:38am

Intel has open-sourced their runtime and everything else OpenCL-related. I will check if I can find what they do.

seantalts · April 26, 2019, 12:03pm

I think this must just be a relic of the underlying C api, and that they intend to copy over the collection at the time that the async call is scheduled. It would be nice to confirm that Intel also does this if you have a chance.

rok_cesnovar · April 26, 2019, 2:04pm

Here they create a EventsRequest object and call cpuDataTransferHandler() here.
In that function they call waitForEvents in this file.

The copy happens here I believe .

seantalts · April 26, 2019, 8:34pm

That all looks right & makes sense to me. Let’s go with that assumption and not worry about managing that memory ourselves. Phew!

Topic		Replies	Views
OpenCL Async performance Developers	2	491	April 30, 2019
OpenCL + MPI Developers	10	1325	October 30, 2018
Managing memory with OpenCL CmdStan techniques , fitting-issues , performance	20	1594	March 30, 2021
Stanc3 Math lib opencl integration Developers	29	1420	September 23, 2019
OpenCL & threading supported at the same time? Developers	9	1430	March 11, 2022

OpenCL async API clarification

Related topics