And some places like Novartis have internal policies that prevent them from installing stuff directly from GitHub.
I don’t know how much of Novartis’s policy is public, so I’ll let @wds15 answer on list.
One of the main concerns of companies limiting usage of software is exposure to IP lawsuits, and CRAN can’t possibly be providing IP indemnification for arbitrary packages. So I imagine any companies restricting software wouldn’t provide a blanket OK for 10K+ CRAN packages.
Other companies draw the line at blocking their developers from using or even reading GPL code. IBM research used to do this in the early oughts—my friends had sysadmins show up at their office door when they detected GPL installs on monitored machines to uninstall them.
I doubt many places blanket-approve any CRAN package, but plenty blanket-disapprove GitHub.
This isn’t true - we can design installers that provide binaries for each of our supported OSes.
I think it might be useful to frame this discussion towards our top-level goal of providing Stan to our users, rather than subgoals that we previously used to achieve the top-level goal (e.g. CRAN support). It’d be nice to have a list of our categories of users (e.g. MPI on some cluster (academic? commercial?), Andrews, etc) and then see how we can provide to the largest subset of them.
Also just to reiterate and potentially obviate this discussion, I’m volunteering to try to set up a modern gcc on Linux if someone can give me a version number available on Ubuntu that doesn’t crash.
Depends what you mean. TensorFlow is only offering very limited support for old GCC versions on Linux, Clang on Xcode, and MSVC 2015 on Windows.
Installation instructions: https://www.tensorflow.org/install/install_sources
I cherry picked some quotes from the above page:
Note that we provide well-tested, pre-built TensorFlow binaries for Ubuntu, macOS, and Windows systems. In addition, there are pre-built TensorFlow docker images. So, don't build a TensorFlow binary yourself unless you are very comfortable building complex packages from source and dealing with the inevitable aftermath should things not go exactly as documented.
We do not support building TensorFlow on Windows.
Note: Starting from 1.6 release, our prebuilt binaries will use AVX instructions. Older CPUs may not be able to execute these binaries.
Digging down into their tested platforms:
The tested compilers are: gcc 4.8 on Linux, Xcode clang on Mac OS, and MSVC 2015 update 3 on Windows.
Finally, if you look at this issue raised by JJ Allaire, you can see that he’s using a docker image of Tensorflow rather than installing from source directly in R. That presumably allows them to get around the incompatible compiler issue with R.
I don’t think this discussion should include scare quotes.
Side note: has anybody made a table of the gcc’s on common Linux distributions and checked if they’re a source of problems? I agree with Ben that in non - corporate projects it seems pretty common to be flexible about fixing compiler -related bugs without expecting a PR
We could build installers for Columbia R packages. Doing that in a secure fashion is a pretty big undertaking. Competent users yelled at us back when we merely had a script that went through the installation process. Also, are you proposing we build an installer for anyone outside of Columbia who wants to make a Stan-related R package?
Abandoning CRAN would cause us to lose a lot of users from all categories.
g++-6 does not crash that test on that machine, but I object to the framing: If it is in fact a
g++-7 bug with
lgamma, then it is our responsibility to file the bug with the gcc people.
My objection is not primarily about what configurations Stan tests. The primary objection is that we would say that we are not going to devote resources to fixing things under configurations that we do not personally test.
I would be surprised if anyone that regularly uses a non-Windows / non-Mac computer would not find it appalling for an open-source project to say something like “we are not going to fix problems on stable releases of gcc”. I don’t see anything to suggest that TensorFlow has that policy. The bug reports you linked to say things like
This is not an issue with GCC, it is due to AVX 512. We still do not have official AVX 512 support
Can you comment on how to override to a supported version of GCC here?
It is from the Xenial repo:
g++-7 (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0.
Nor do the RStudio people say things like “let’s build installers for a bunch of platforms that work for one compiler per OS that users can download directly from us instead of going through CRAN”. The RStudio people make their products work through CRAN
That CRAN package is a wrapper around an install script that downloads a pre-compiled binary that isn’t hosted on CRAN, right?
Not a scare quote. It’s a use-mention distinction. I’ve done way too much typesetting and linguistics.
Yup. Here’s the relevant source file: https://github.com/rstudio/tensorflow/blob/master/R/install.R
And here are the install instructions: https://tensorflow.rstudio.com/tensorflow/articles/installation.html
I was giving you the benefit of the doubt assuming you just experienced momentary irritation but now it’s clear you carefully constructed that sentence on purpose and I’m truly appalled. :P
It seems like the contention here is that there is a potential perception that Stan will no longer fix bugs for certain compilers. I don’t think that’s what is being proposed, but would it help if we wrote a policy stating that compiler bugs aren’t a WONTFIX for certain compilers? Testing on multiple platforms and compilers shouldn’t be a major bottleneck, but it is. It’s resource intensive, makes the turnaround time on pull requests slower, and until we continuous integration stable, it really does slow things down. I think we need to balance the two.
Would it suffice to have:
- a policy saying that we will build in workaround for compiler bugs for a set of compilers (itemized with minimum [and possibly maximum] version) on each platform
- a list of exactly what we’re testing on for every single pull request
- possibly have a set of tests that’s run periodically on more compiler / platform combinations?
We recently said we weren’t going to actively fix a template compiler error on a new version of an Intel compiler. It was verified that the same code could be compiled on an older version. Was that the right thing to do?
It was a bit incorrect of me to describe the proposed policy as WONTFIXING compiler issues, but the proposal is COLUMBIA_WONTBEFIXING them, which I have the same criticisms of.
Minimum versions are fine and indeed encouraged in the open-source community if doing so gets you better standards compliance. Maximum versions are really frowned upon, except for pre-releases. Setting minimum = maximum is unheard of AFAIK.
The non-open-source compilers are a bit different and I don’t feel any obligation to them. What we did in the past with the Intel compiler was mostly at the behest of Novartis, and I don’t think they are using it for Stan anymore because it lacked support for something that Stan has in it now. MSVC is tougher because of PyStan, but last I heard, I thought PyStan was moving back toward mingw on Windows. Oracle is a CRAN platform with no users, but for C++ packages, the tests run with
g++ rather than Oracle Developer Studio.
I think we all make our best attempt at workarounds for compiler bugs when we can verify and really figure out what to do. I don’t see anyone saying they won’t fix them. Even if it’s for compilers that are uncommon.
Pragmatically speaking, it’s a matter of checking before each PR or waiting for someone to reply that it’s broken. I’ve usually been on the side of checking before merging. If it were easy to check, I’d be all for it, but right now Travis isn’t stable.
If that doesn’t suffice, then perhaps you could suggest what would? Would you be able to help configure Travis + other boxes so we can test them on every PR? Right now, almost all this work is falling on Sean and it’s a burden. Maybe reducing his burden would allow us to expand what we test. If not, it makes sense to pick a minimal set of things and have a policy where we do try to create workarounds for different compiler versions when the bugs come up (and they will).
Lemme call out a few points I think are missing:
It seems like there’s been a large number of weird pull reqs. in the last year that have made compilation tricky. GPUs, MPI, and complex numbers in particular stand out. There was some threading and OpenMP stuff too? Surely this will slow down soon?
I’ve always developed vanilla Math library stuff (cpu + Eigen + Boost). The CI stuff works well for me; it’s super helpful, and it’s gotten better in the year I’ve used it (thanks for clang-format @seantalts !). I use clang on my laptop, but the g++ tests have helped me catch many a minor bugs (type mismatches, private vs. public member access stuff) so I like them
I’ve always felt like turnaround on the CI stuff is fast enough. I make a pull req. and then go do something else and come back and check. It’s faster than my laptop, and it’s faster than reviewers. It’s not been the primary bottleneck on any of my pulls.
In general I agree with Bgoodrich.
But to do anything other than muddle the waters with my opinions, I’d need to see the exact list of current build targets and then the proposed list of build targets. I assume there’s some combination of turning on/off GPUs/MPI/Distribution tests that’d make everyone happy, but maybe not.
Thanks everyone. I definitely learned a few things here already, and not just about “use-mention.”
I think it might be useful to try to summarize - first, here are what I think are the main pain points:
- Testing takes a long time and we’ve started to see jobs queued up for days at a time before they can go through.
- Travis is flaky - often has spurious timeouts as well as a weird ubuntu image.
- The gauntlet of outdated compilers is fairly daunting and difficult to actually get non-trivial code through. There are many examples, but see https://github.com/stan-dev/math/pull/789 for one. This slows down development very drastically for a small percentage of PRs.
- We anticipate adding many more tests, and we haven’t even approached almost any thorough end-to-end testing yet.
- We obviously have limited resources both paid and unpaid, and want to focus on high ROI goals (we should probably start a separate thread on what exactly this is - I think maybe first in Bob’s and my mind would be allowing statisticians to fit complex, bespoke bayesian models that no one else can fit, but I know there are diverse perspectives here).
I think there are a few underlying factors we might try to look at when looking around for solutions to our problems:
- Totally my fault - I’ve been increasing the tests we run over the past year or so to try to actually test everything I’ve noticed us claiming to support. Eventually that target looks something like, minimally, (Compilers x OSes x powerset(MPI, GPU, threads, OpenMP, …)). This thread is essentially my attempt to back off from that trend and go back to something more reasonable for our resources and number of tests.
- Distribution tests are a large part of it, but short of the row vector thing I haven’t come up with any mitigating solutions here yet.
- Hardware situation - I’m trying actively to get more machines somehow, but it’s pretty expensive and turns out to be hard to find free colocation. We’re looking at something like $10k a year for another machine in the cloud.
At this point it might be best to outline what we think our demographic is. Once we think we have that in mind, we can look at how different possible solutions affect each of the groups (and what % of our existing users overall). Though if @jjramsey is right about GCC being the only legitimate compiler for MPI, I think that might trump since MPI is very important to us. @jjramsey, does that mean MPI usually needs the default GCC compiler for an OS? So for Ubuntu Xenial, that’d be the one that comes with build essentials, 5.4 I think). Not sure how many clusters run Ubuntu though.
And to summarize the state of our current disagreement - it seems like most people are okay with reducing the number of compilers we test on PRs and merges to develop to one per OS already, but many people think the Linux one should be a version of GCC and the MPI constraint seems like a forcing issue.
So I think the only remaining issue is the level of support we provide (where “support” and “we” seem open to debate) for compilers we aren’t testing. I am mostly saying that no one on the project should feel obligated to fix compiler bugs for compilers we haven’t anointed, but of course we’ll accept issues and PRs and I think the normal mechanisms behind open source will continue to work on these compiler bugs without instilling an explicit sense of obligation into any particular group of Stan developers. @bgoodri thinks this means that there’s some nontrivial chance we start failing tests or builds on some of the compilers that CRAN forces us to test, and that this would end up forcing him to fix the bugs for these other compilers. Which I think could incentivize him to help us get off CRAN ;P But seriously, this is a real issue as long as we’re on CRAN.