Benefits of parallelization with a threadpool of the Intel TBB

Hold on there before you throw out all hyperthreading altogether. You can see in my Threadripper machine that hyperthreading does improve things (albeit at a slower rate). I think it is more likely that the cache is saturated on the intel machines. You can also see for Stevo’s machine above hyperthreading helps with the TBB reduce method at least, but not the TBB map. Note that mine and Stevo’s machine actually have the same cache despite his having more cores. Seems to me that cache needed per core/thread may be the limiting factor - not hyperthreading per se which is on by default anyhow. You can say only use 6 cores but the OS may well be spreading that out over 12 physical or virtual cores - you can see this happening in linux if you use the top command then press 1 - it uses all the cores at low load rather than just 6 at max load. If the program needs more cache per core than the total cache allows - stuff slows down (or at least stops speeding up). No?

The reason I was curious to see your MBP results is because your MBP and my MBP are very close in spec, with CPU speed and cache size being some of the few differences. To the rough inspection it looks like on your machine performance plateaus at 6 cores - on mine it may even dip. You could use my csv file in the first post above to plot yours and mine on the same graph.