Test systems & supported compiler/os

wds15 · April 18, 2018, 6:55am

Hi!

… I think we need to revise our test platforms to get threading properly tested. We just broke threading support under clang in develop due to this part not being tested. This is due to a complicated situation:

Travis only has a super old Linux Ubuntu LTS 14.04 which causes a ton of problems => we can test things with g++, but not with clang (which needs extra hacks which we do not ship at the moment with stan-math as its beyond me to reliably configure the build system to detect systems which need this fix apparently needed for 14.04).
our Jenkins runs with a pre-2016 clang Apple compiler which does not support the C++11 thread_local keyword. Hence, we cannot test on the Jenkins box any of the threading stuff.

So much for the background. A solution from my angle would be to use on the Jenkins box an open-source clang which does support the thread_local keyword. My suggestion here would be to use the so-called clang4 as used for CRAN binaries for macosx. Would others agree on that this is a good idea to switch the target compiler on macos to the clang4?

On an additional note I wonder what our policy is for supported compilers. That is, with the threading I am running into all sorts of walls as travis shipping too old distributions, compiler having bugs, etc… It is really hard for me to digest all this and even harder to get a handle on systems which I do not have access to. I have started to download virtualbox to have a chance to understand this, but honestly, I do not really like to go down that route unless we decide that’s needed. I mean, it is a lot to expect from me to juggle virtualbox LTS images in order to run tests - my opinion.

Threading is super important, but this feature does not fly on all compilers/os combinations equally well. How can we deal with that in a way that makes development easy for us?

Any thoughts here would be helpful. The goal is to find a good solution for developers and users.

Best,
Sebastian

sakrejda · April 18, 2018, 4:46pm

It sounds like you’re getting bogged down with this. I can prioritize helping testing this over my pet projects (sigh…) once the next conference is over (April 30th). I agree that what we need to finalize the PR is to stop moving the target (discovering new broken configurations that “might” be important).

On Linux would mean covering: stable and latest supported red-hat, stable and latest supported Ubuntu b/c of clusters and desk-tops respectively. Even clusters with old red-hat installs seem to usually allow you to select an up-to-date compiler. I tend to use odd distributions for personal stuff (Gentoo and Arch) but those are usually more up-to-date than Ubuntu/Red-Hat so this is a non-issue there.

For Windows we’re pretty well tied to whatever Rtools distributes

For OS X given the trouble the native compiler has given us we could just go with whatever is current on Homebrew if we link/maintain decent install instructions.

We can always evolve our target and fix things if we accidentally cut out users we think we should be supporting but hidden requirements are just going to kill progress on any deep changes like this.

ariddell · April 18, 2018, 5:53pm

What clang version do you need?

pystan tests using gcc and clang on travis. The build setup is:

ubuntu trusty
gcc 4.8.4
clang 5

see it in action: https://travis-ci.org/stan-dev/pystan

wds15 · April 19, 2018, 8:08am

ubuntu trusty is too old for threading. It requires weird hacks which can make it work, but we decided against including these. trusty is from April 2014 and super old.

wds15 · April 19, 2018, 8:26am

Hi!

Yeah, this threading stuff is taking a serious amount of time by now. There is good news: It looks like that I can test on travis on osx with the Xcode 9.3 clang compiler suite. That is a recent enough Apple clang which supports thread_local.

Having gone through a lot of trouble here is what I suggest for the threading:

The code will by default (no threading) stick to an implementation which should give no performance penalty. This will use static variables of a class and as such gives us external linkage. This is essentially the old behavior (which suffers from being prone against the static initialization order catastrophe).
When threading is turned on I am suggesting to use the apparently standard C++ approach to let a static function return a reference to a static variable declared inside that function. This will incur some performance penalty (we have seen up to ~15%, but it really depends on the model). On the upside is that this is the standard way to implement thread-safe singletons. Hence, we can expect that this sort of code is well supported by the compilers. From messing with all of this we have learned the hard way that deviations from it easily kill portability.

To make this happen, I am affraid that we will need to define a macro to access the stack. This macro definition will be different depending on threads being used or not. I hope everyone is OK with this.

So calls like this:

ChainableStack::instance_.nested_var_stack_sizes_.empty();

will read like

STAN_MATH_ADSTACK(nested_var_stack_sizes_.empty());

and the definition of the STAN_MATH_ADSTACK will depend on the use of threading or not.

This should give use best-case performance without threads and when we use threading things should be safe.

Tagging @seantalts, @Bob_Carpenter, @syclik .

@sakrejda Help is very welcome in the form of more ideas / advise / code / …

Best,
Sebastian

seantalts · April 19, 2018, 4:51pm

I was just saying in the meeting that it could be worth investigating what I think is essentially @ChrisChiasson’s idea presented here: https://github.com/stan-dev/math/pull/840#issuecomment-382127187 . The only thing this buys us is the lack of a macro, and it could potentially have a performance penalty although the way he suggests it, we wouldn’t face that in anything Stan or even RStanArm. But I won’t have much time to devote to it and Bob doesn’t care much about the macro, so I’m open to just doing it that way too.

wds15 · April 19, 2018, 5:03pm

Ah. So you mean we always call a context() function, but for the non-threading case this just returns the global and in the threading case we return the function local static?

I had that idea as well. It would look very odd…but could do it. Is that how you understood it as well?

seantalts · April 19, 2018, 5:06pm

Yeah - but I haven’t had time to dig into what it would actually look like. So if you think a macro would be cleaner feel free to go with that, or experiment and choose one, either way.

wds15 · April 19, 2018, 5:09pm

I will try that first, since this can essentially reuse my old stuff with minor modifications.

I hope my git fu serves me well on this one as I have to go back in time… let’s see.

Bob_Carpenter · May 1, 2018, 3:02am

That’s certainly too much!

I think failures on Travis, etc., highlight the kinds of problems users will run into when we start demanding more advanced compiler support.

Topic		Replies	Views
One Compiler Per OS Developers	103	3741	July 24, 2018
Potentially dropping support for older versions of Apple's version of Clang General	14	1643	April 13, 2018
Makefiles and compiler flag `-DSTAN_THREADS` - always on? Developers	7	1473	April 22, 2020
Threading in pystan Interfaces pystan	3	781	October 16, 2018
15-20% ish performance regression Developers	6	865	April 9, 2018

Test systems & supported compiler/os

Related topics