Compile flags for PyStan: not optimized for speed?

From Bob:

PyStan seems to be using -Os, which optimizes for target size
rather than speed. It’s described as doing all the -O2
optimizations that don’t increase executable size.

I don’t know why it’s not -O3.

If there’s a way to change that -O3 from -Os, I’d be curious as
to how that changes the result.

Before it was -Os, it was -O0:

fix: Include default `data` value in function signature by riddell-stan · Pull Request #252 · stan-dev/pystan · GitHub

We’re being careful here since something broke with -O3 on certain
platforms.

also from bob:

I don’t think it matters. I just ran a test of optimization levels
on a simple test program:

  • -O0:
  • Elapsed Time: 0.096003 seconds (Warm-up)
  •           0.113985 seconds (Sampling)
    
  •           0.209988 seconds (Total)
    
  • -Os:
  • Elapsed Time: 0.012302 seconds (Warm-up)
  •           0.023481 seconds (Sampling)
    
  •           0.035783 seconds (Total)
    
  • -O3:
  • Elapsed Time: 0.012289 seconds (Warm-up)
  •           0.023156 seconds (Sampling)
    
  •           0.035445 seconds (Total)
    

So at least on g++ on the Mac, -Os and -O3 don’t look any different.
This surprised me, because -Os is described as doing all the optimizations
that don’t increase size; I figured most of the optimizations would
increase code size, but then we have many many one-liners that are just
as easy if not smaller to inline.

  • Bob

Did you get my other message that I couldn’t measure a difference
between -Os and -O3 in g++ on my Macbook? So I don’t think this is
an issue.

  • Bob

Guess that answers my question :-)

  • Bob

I’d still test it out just to make sure. That was with the version of g++ that’s distributed with XCode. If you’re running with GPUs, you’re using a different g++ version, which might behave differently.