C++, Multithreading, Benchmarking

If any other C++ devs aren’t following the github, here’s the discussion we’re having. If steve’s not available, any input is appreciated.

Again, benchmarks were done just on executing stan/math/prim/fun/exp.hpp and the text file is available, and I’m happy to give a walk through. But it may have had a positive effect on final Stan run times, but hard to tell because it’s stochastic. But posting here for exposure to what I’m investigating. But any questions answered are appreciated.

If multithreading prim improves performance, sure. No reason in not threading it. Esp. if it’s something used repetitively in the library then there’s possible combinatorial gains in speed, but this needs to be evaluated more thoroughly.

But the devs are focused on multithreading AD, I think sending different threads per expression tree? If someone wants to elaborate a bit that’s cool.