I’m curious if anyone’s tried using Intel’s compilers and seen any performance improvements? It tends to autovectorize code more easily that GCC or Clang, but lags on fancier C++ features & standards.
It has been tried, mostly by Sebastian. You have to distinguish between icpc and the Intel Math library. Most of the benchmarks showing that the Intel Math library are for big operations involving doubles, which don’t happen that much in Stan, although they might depending on what is in your model. The icpc compiler definitely takes a lot more RAM and time to compile a Stan model.
Right, I definitely mean, does
icpc generate correct code which is faster (better vectorized or whatever)?
HMC warmup takes > an hour for the models I’m looking at, so 10 minutes compilation is fine if it speeds up sampling by more than say 5%.
My guess is no, unless you are doing big matrix operations or ODEs or something.
It loses a bit of precision, too, if you run it in fast mode. We had to loosen some of our tests so it’d pass on some of the Intel compiler settings.
But no, I don’t think anyone’s seen any huge speedups beyond what you get version to version in something like g++. I’d be interested to see benchmarks on real models people care about.
I looked at this and found intel was much slower at compiling the Stan program, and the compiled program was ~10% slower at running.
The part about compile time is universally true AFAIK with Stan; the part about runtime varies depending on the model.
Is this still true with new versions of GCC? I ask because someone had posted some results three years ago with an older version of GCC in which ICC was faster, but when I tried with the latest GCC and the same model, GCC was as fast as ICC.
I found my original post on this by the way: