I think we should try to get some native C++ thing working as we have a huge MacOS user base who will struggle with getting this to work. From reading a bit through the net I think we can go for a thread pool like detailed here:
This would allow us to fix the number of threads to a fixed number. These threads would constantly run and we assign them work to do once in a while.
The other reason to go with vanilla C++ is that we do not have to do weird hacks for exceptions… OpenMP is really great to get things quickly working, but after all I think it is a very low-level technique which introduces a heavy C like feeling to the code.
If you code on an experimental branch and want me to take a look; I am happy to do so.
This would be just awesome to have machine local threading and MPI combined (oh, I forgot the GPU)!
Just to add: OpenMP is fine by me; I just think a C++ only solution is preferable and we should try to get that running if we can with reasonable effort.