Thanks—that’s a very helpful intuition.
We never looked into this again after Matt Hoffman told us it was a huge pain (he spent an internship at Google working on PGO, but that was at least 8 years ago).
If -O2 compiles faster and produces smaller output code and the time penalty is in the noise for end-to-end runs, then by all means we should switch to -O2.
Definitely.
I’ve always had that turned on. Didn’t realize it was that significant. Is that for code with lots of big matrix ops? Those will be the places with lots of double-based loops to unroll.