I have discovered some weird, unintuitive stuff when running benchmarks about the performance of C++ that seems to sometimes even go against traditional wisdom, and I wanted to publish them somewhere we can all see and reference. All the source is in the perf-math repo, though I need to better organize it soon. I’m going to try to collect weird benchmark results in this post and keep it up to date.
std::vectors
.reserve() followed by push_back is slower than normal initialization and operator[] assignment
Something we’ve verified recently is that threading performance results don’t hold across different versions of compilers (gcc 5 vs gcc 6), they don’t hold across different compilers (clang++ vs g++), and they don’t hold across OSes (Windows gcc 4 vs Linux gcc 4).
I think it’s still pretty safe to assume that numeric computations are optimized similarly, although I’ve seen threads saying that’s not true for Intel. We should still check.
@increasechief and @stevebronder If you’re writing a design doc feel free to submit it as a pull request on the design-docs repo and just paste the link here. Then we can all comment on it there and and it can evolve with feedback. Thanks!
Do you have examples of these? I remember seeing only that performance improvements might appear only for clang and not for gcc, but never the inversion of a benchmark result such that what seemed like the best answer on X was not the best answer on Y. Of course, separately we saw that the Mac Pro had a much steeper performance penalty for pointer AD stacks compared with all of our Mac laptops (which I guess is a 4th dimension across which benchmarks can vary - hardware). But I don’t remember seeing benchmarks conflict about which code was faster, just the magnitudes, right?