Build tool - make vs?

Hey,

I think I remember @sakrejda mentioning in a Stan meeting that he was working on a cmake version of Stan (not sure which repo this was, maybe math?). Is that still going on? I’ve been wondering how difficult it would be to switch to a more modern system in the hopes of speeding builds up and doing incremental builds safely more often (esp. for testing). I’ve heard Bazel highly recommended by people, though they typically have had some Google affiliation in their past. I have no personal experience with any of the C++ options outside of make so curious what others think.

Thanks,
Sean

1 Like

The issue here is user-friendliness – the current make system has been specialized sufficiently well that it works out of the box for most systems we encounter, but we’ve had no end of trouble with tools like cmake. If we can find something that works better than make, works out of the box on Linux, Mac, and Windows, and we’re willing to translate the current build configuration over then I think most of us would be receptive.

I did get cmake working in a few different contexts but once I was done with it it turns out I needed some command-line invocations in there to get the protocol buffers and Stan models build. The error messages are terrible (it’s an auto-generated make-file so… :/ )

I will give Bazel a shot at some point but I tried make/cmake/R’s remake and one of the Python-based tools and all of them have weird stuff embedded. For straight C++ I’d go with Bazel or cmake but Stan isn’t straight c++ (we sometimes auto-generate c++). Maybe going by the Bazel protobuf scripts we could do it right in that system but I haven’t had time to push it.

K

1 Like

Others have also tried to port our makefiles to cmake and eventually gave up.

My own experience installing third party software is that cmake almost always fails in cryptic ways. Whereas makefiles usually work for me.

I’m pretty sure the bottleneck in our testing is because of dependencies, not because make is slow or bad at figuring them out.

Because the tests bring in high-level headers (like prim/mat.hpp), any changes to any included file in mat.hpp requires us to rebuild the world. I’d be surprised (but very happy!) if another system could get around this in a cleverer update only if the things you use in a header change, not if the header itself changes.

Another reason things are so slow is that Stan’s largely header only and the template metaprograms are very slow to compile.

A third reason is the combinatorics in the probability functions. We’re not comfortable with fewer tests because the underlying code can branch on any of the distinctions in input type to have different behavior. They’ve saved us on many occasions.

The appealing feature of cmake is that you can specify c++ builds really cleanly but the moment you have to do something more complex (e.g.-our testing) the scripts turn into a nightmare of custom_command/target thrown all over the place.

I’d be happy if we could move from make. I’ve thought about this for a while. Here’s what we need to do:

  • Math. For developers only:

    • Build CVODES library
    • Build GTEST
    • Be able to generate tests
    • Build individual unit tests
    • Build doxygen doc
  • Stan. For developers only:

    • Build CVODES library
    • Build GTEST
    • Build a test version of the Stan compiler
    • Build individual unit tests
    • Build doxygen doc
    • Build latex manual
  • CmdStan. For users:

    • Build the Stan compiler
    • Build stansummary
    • Build CVODES library
    • Pass Stan program through the compiler, then compile the generated C++
  • CmdStan. For developers:

    • All of the above
    • Build doxygen documentation
    • Build manual

It’s easy to require developers to install more tools. For the users, we should think about dependencies. The way we currently have our CmdStan build system, the dependencies are:

  • Linux: preinstalled make, C++ compiler
  • Mac: XCode installed make, C++ compiler
  • Windows: RTools installed make, C++ compiler

Why do we use make during runtime / for users? Can we keep that subsystem the same while upgrading the developer build system?

What do you mean "make during runtime / for users"?

Oh, I think maybe I misinterpreted what you were saying - for the CmdStan users you meant users who are compiling CmdStan, not users who are using CmdStan. Are there a lot of people who do that?

To use cmdstan you need make.

CmdStan users need to have make to run CmdStan. If they have a Stan program, say foo.stan, they need to be able to create an executable from foo.stan. This requires:

  • bin/stanc having been built
  • Passing foo.stan to bin/stanc generating foo.cpp
  • Compiling foo.cpp to an executable

We don’t strictly need make to compile files.

Ah, right, sorry. So using CmdStan implies compiling the rest of Stan and the math library? Hm, yeah it would suck a bit to force users to install something like Bazel to use CmdStan…

In broad strokes, yes, that’s what it implies. We could add a shell script and a Windows batch script to compile if we even wanted less dependencies. But I find that make is typically installed with whatever toolchain is available.

Most of Stan is still header-only, so we just need to include Stan and Math libraries. CVODES is the exception, and in fact, we can compile without the library if necessary (and if we don’t use the CVODE ODE solver). We tend to just build it because we’ve figured out that process reliably.

@seantalts @danluu If the cmake thing has legs, I’m still cheering for it. I am pretty intrigued by the idea that CLion can make it super easy for me to navigate the math library if it has a CMake file to go from.

Ofc. changing Stan’s build system so I can try a commercial IDE to make up for my lack of emacs-fu is pretty terrible reasoning, but whatevs…

Oh wow talk about a zombie… I thought that was March 17th, 2018 hahaha. Oooops

Buhrains?

I think it has legs! The PR was closed just because no one was working on it actively. Dan Luu estimated there’s a week left of work on it.

We could treat the builds separately.

It’d be easy replacing CmdStan’s build process. We could start there.
Stan’s a little complicated due to building the compiler (needing to instantiate templates), but should be relatively straightforward with any build system that handles C++.
Math is complicated due to our testing, but it’s straightforward.

I’m happy to talk through the build requirements for any of those three libraries. If someone knows how to use a build tool really well, we can switch.

Unfortunately, I’ve been doing non-Stan stuff lately, but the last time I tried, the cmake branch of math seemed fine (I was using it for my own development, CLion worked fine with it, etc.) with the caveat that it didn’t build the distribution tests. It’s possible that it needs to be updated to get it back to its previously working but not finished state.

The next time I have time to do Stan stuff, I think I’m going to try to finish the benchmarking I started (unless someone else already did it), and then I may take a look at this afterwards, but there’s a lot of potential work, so no promises :-).

2 Likes

@syclik do you think it could be worth merging the cmake stuff we have since that provides IDE support? Or do we want to wait for it to be feature complete with make so we aren’t maintaining two systems? I can see both sides here, though am a little prone to think the cmake branch might have more impetus if we check it in…