Using unlikely/likely macros from boost?

Hi!

Has it been considered to use the

BOOST_LIKELY and BOOST_UNLIKELY

macros which are defined for much more compilers than just gnu gcc?

Boost has definitions for g++, clang++ and Intel as far as I can tell.

Was this considered or is it by some other reason not reasonable to use this?

Best,
Sebastian

See
http://www.boost.org/doc/libs/1_60_0/boost/config/compiler/gcc.hpp
http://www.boost.org/doc/libs/1_60_0/boost/config/compiler/clang.hpp
http://www.boost.org/doc/libs/1_60_0/boost/config/compiler/intel.hpp

We could use the BOOST forms instead if they do the same thing
or better. It looks like they’ll reduce to our existing use for
g++. Is the Boost module header-only for them? If so, I’d rather
use the BOOST versions to push off all the ifdefs on compiler
type into BOOST—they’re much more likely to keep up with the compilers
than we are.

  • Bob

Hi!

Quickly looking over this I got the impression that boost config is header only. Do we have a good benchmark for Stan where I could test if now clang compiled programs speed up due to enabled branch prediction?

Sebastian

The best thing to use might be the tests in the
autodiff paper. It’s on GitHub under gelman/papers/agrad_rev
if Andrew gave you permission to his repo.

We moved all the private repos off of stan-dev and into personal
repos.

  • Bob

Hi!

I now managed to benchmark using the examples you sent me, Bob. Specifically I use the normal_log_density_stan example and I cannot really conclude anything from that. I first tested if the g++ compiler did accelerate due to the use of the likely macro before going to clang++. All of this was performed on my macOS 10.12 laptop using the current dev version stan.

  • g++ 4.6 seems to benefit from using likely, but this is far from being clear from the benchmark
  • g++ 6 seems to slow down from a defined __builtin_expect macro
  • clang can be faster for small sizes, but seems to be slower for large sizes

All of the above would need extensive testing to get the statistics right. Maybe we have some better tests for this?

As of now I would recommend to switch to the boost macros if we really think we are doing something smart here; from what I have seen so far - I don’t know. I remember Mike having seen major speedups for higher order derivatives… so maybe these cases can answer the question more clearly.

Best,
Sebastian

PS: The quoted version of stan 2.6.3 does not work out of the box with the code you sent me. Maybe there was an update in between? The current dev version of stan-math worked somewhat easier with it…

What’s Stan 2.6.3? Or… if you’re really using Stan 2.6.3, could you try with 2.12.0?

Hi!

I was first using stan 2.6.3 as this is the version referred to in the arXiv autodiff paper. My comment related to the fact that the sources Bob sent me did not work out of the box with 2.6.3; so maybe there was an update to the sources after the paper was put on arXiv.

However, what I have reported above is against the current dev version. I guess that’s close enough to 2.12.0, or?

Sebastian

I would say if you can’t find measurable improvements, don’t
bother changing anything. It’s just more work for everyone.

I wasn’t sure what the bulleted points were comparing — Stan
2.12 vs. a version using Boost? So when you say g++ 6 seems to
slow down, which is slower, the way we have it now or the way with
boost?

And how much difference and how much variability are you talking
about?

  • Bob

Hi!

I followed up on gcc6 to disentangle if

boosts likely vs “bare” computation (no branch prediction)
stans vanilla likely vs “bare”

is different. To me it looks like I am a victim of small effect sizes here - all noise. I would have to do serious simulation to find out more and be able to report something useful. Let’s just put this aside… looks like digging in dirt here.

Sebastian

We discussed this at the last Stan meeting and Sebastian’s timings didn’t show an actual improvement in g++ or clang++.

I believe Sebastian’s action is to see if there’s measurable improvement with Intel and if so, we’ll consider. If not, we’ll put this to rest.