Isn’t this controlled by the users’ invocation of map_rect
? That is, aren’t all the map jobs run in parallel?
I think map_rect_concurrent
or map_rect_threaded
would be a better name, as it’s not fully asynchronous. We’re waiting on a future at some point.
The algorithms need to know what the parameters are, at least the ode_integrate
does in order to be able to create the coupled system. How do we do that with closure-bound parameters?
Yup. I usually just wind up with multiple copies of cmdstan
floating around, then never know where to find the one I want since I’m not used to multiple copies.
That’s all it ever did, but there used to be a pretty big slowdown just for doing that.
Yes, and I hope we get there in the design. I think getting the API right and picking the low hanging fruit first is the right strategy for moving forward.
I would prefer to have this configurable, too. We want people who run four parallel chains in parallel on a four-core machine to not also try to multi-thread four ways within each of those chains.
Sounds like the right thing to do.
Is the idea that if we have only STAN_THREADS = 4
and we have a map_rect
call with 20 jobs, we make sure we only run 4 threads at a time?
Not at all. I designed for this from the very beginning. The reason we never turned it on before is that there wasn’t an easy way to do multi-threading cross-platform before C++11, we didn’t have the map_rect
design with all the back-end work you did on a parallelizable function, and we were seeing big slowdowns while running multi-threaded. Then when I saw thread local was a thing in C++11 along with multi-threading, I made this issue before I really knew the details for C++11. Sebastian then sorted it all out!
Actually, the models are threadsafe now other than the line number counter. So that’s going to have to be made thread local, too, if we want reasonable error messages. If there is negligible overhead on this, then no big deal.
I’d still like all the thread-local declarations to be conditional on running multi-threaded code unless we can show there’s very low overhead to making thread-local declarations.
This is often a sign that you’re doing something like making access to illegal memory. The higher optimization levels are much better at reusing memory layout.
I’d rather avoid MPL if possible, as it’s a bit more restrictive than BSD.
If they’re not, running multi-threaded won’t work. They’ve been kept immutable after construction with one exception I’m aware of other than the line number.
That’s great news.
This is a better approach to singletons in C++11 anyway. I’ve been meaning to reconfigure everything in math to do this anyway. Does the thread local work on local function static variables?
The compilers are very good at inlining simple function calls.
Absolutely. Multiple threads are way better when possible.
Exceptions that aren’t caught or seg faults are going to cause unrecoverable crashes in whatever we do unless we start engineering for failover.