This is talking about the need for placeholder vars so that when we have multiple grads() running we don’t have any race conditions?
When different threads start nested stacks, are those separate from the main stack? Is this what makes this work?
That’s fine with me.
+1 Excellent New Year reference
I looked at the current C++. Without actually understanding what the TBB stuff is, I like it. I am curious what’s possible at the interface level. I guess that’ll determine what C++ we need to write. I’m happy to help with the coding here.
Yeah, I am curious here what options we have.
Just to refresh, the inputs would be similar to map rect? Except now the inside function should return a scalar and we’re going to accumulate it?
I vaguely remember from a long time ago (I think this: Parallel reduce in the Stan language) that there may have been some different goals at the interface as a result of TBB doing the scheduling for us (but I forget what these were).
So roughly we were replacing something like:
for(n in 1:N) {
target += some_lpdf(as[n], ...)
}
with:
target += parallel_map_reduce(some_lpdf, as, ...)
and there are shared params and whatnot in the (…).
@seantalts and @Bob_Carpenter, the goal here is to provide automatic parallelism for functions that act like lpdfs (all functions return a scalar and they all get added up). I assume what is possible right now looks a lot like map_rect, and eventually we want to take advantage of closures to make the interface clean.
So there’s reasons those functions would take all sorts of different things. And we get the packing/unpacking problem.
In the interim, is there anything fancier we can do with an interface like this with the new compiler?