In order to parallelize sampling I initially used mpi. As I understand as long as you are not using more than 1 node threading is not less effective. Right?
At the moment the TBB is used to parallelise map_rect and I would expect that MPI will give you the same performance. The thread you refer to doesn’t even compare against MPI as I can see. From my experience, the TBB map_rect is now just as fast as MPI. So threading was lacking in speed behind MPI, but that slowness of threading is now gone with the use of the TBB.