-O2 vs -O3 compiler optimization level

Thanks—that’s a very helpful intuition.

We never looked into this again after Matt Hoffman told us it was a huge pain (he spent an internship at Google working on PGO, but that was at least 8 years ago).

If -O2 compiles faster and produces smaller output code and the time penalty is in the noise for end-to-end runs, then by all means we should switch to -O2.

Definitely.

I’ve always had that turned on. Didn’t realize it was that significant. Is that for code with lots of big matrix ops? Those will be the places with lots of double-based loops to unroll.