Standard for reviewing pull requests in the Math library


#1

I’m going to write down some guidelines later this week for standards for both reviewing and writing pull requests for the Math library this week. This should manifest as an updated PR template and wiki in the Math library.

I haven’t written it out yet, but welcome any / all feedback as it gets written. I know that the quality of review seems (or is) arbitrary (for example, see comments or a recent pull request). Some things have to be open to interpretation, but hopefully we can settle on what should be done in every review and what things the reviewer should ask also.

I’ll try to structure it as general guidelines and a (minimal) checklist. Hopefully we can make things clearer for developers, at least at the Math library.


#2

cc @Bob_Carpenter, @wds15, @seantalts.


#3

Thanks—that was me :-)


#4

History

The Stan project started before pull requests were even a thing. We started out on subversion without thinking about branches or permissions on who could and couldn’t commit. We barely had any test coverage. We had no official process for the longest time–and this was ok. We all wanted to be kind to our other teammates and problems didn’t live in the main code base for long. It was Matt, Bob, and myself sharing the same office and if something broke, we could tap on someone’s shoulder and get it fixed immediately (within minutes).

As the project grew, the process evolved. We moved from our own managed subversion server in Andrew’s office to Google Code and finally to git and GitHub. We also split the single Stan project into many pieces, Math being one of them. As more developers joined (@bgoodri, @Marcus_Brubaker, @Maverick, Peter, @betanalpha, and others), we needed the state of the code to be in a trustworthy state so that our work wasn’t affected and our changes didn’t affect the other work. As we discovered new problems, we added new types of tests to prevent that particular problem from popping up again.

All of that’s just to say that our process is something that emerged, may not be as efficient as something designed, but most things are there for a reason.


#5

Goals

As I’m thinking about the standards for pull requests, I’ve had a chance to reflect on what we’re trying to accomplish. First, the Math library needs to be stable. Alone, it’s not much, but the rest of the Stan ecosystem depends on it working: correct and buildable.

Beyond the code, we want the community of developers that support the Math library to treat the project as an open-source project. Individuals contribute to the Math library so it gets better. There isn’t a sense of code ownership – once code makes it into the repository, we can all modify it and are a bit responsible for it. We want discussions to be in the open and in a place where others can follow along or comment. We’ve already had a lot of wonderful contributions from having open dialogs.

Prototypes

Prototypes are great! The default place for discussions on prototypes should be here on discourse. Sometimes there might be an issue already open and discussion can go there. The goal of a prototype is to show a proof of concept. It’s to show that something could possibly work.

Most prototypes aren’t pull request worthy; and that’s ok! There’s a bit more to getting something ready for inclusion in Stan.

Pull Requests

New features, bug fixes, refactors, and essentially any code change comes in the form of a pull request. We would really like it to follow this wiki: Developer process overview (we may adjust this for Math):

  1. Create an issue on GitHub
  2. Create a branch for the issue
  3. Fix the issue on the branch
  4. Create a pull request
  5. Code review
  6. Address code review (if necessary)
  7. Merge into develop

All development is done by individuals that may have different reasons for creating a pull request, but when a pull request is submitted, it’s really about the contribution to the Math library. Once the pull request is in the Math library, the responsibility lies on all the developers (present and future!).

This is why we want:

  • code to be styled consistently (it’s not about your preference, but about reducing the time it takes others to read your code)
  • contributions to be idiomatic to the language – that’s either C++, doxygen doc, make, C preprocessor macros, etc.
  • readability; the code is written once, but will be read by a reviewer and then again some time in the future. It’s not about how it’s written, but how it’s read.
  • documentation to accompany the pull request
  • tests; we don’t want to add technical debt to the Math library with every pull request
  • design to be reasonable. we want someone else in the future to look at the code and understand the design
  • it to build. This is tough. Right now, we support Mac, Windows, and Linux operating systems and g++ (4.6+) and clang++. We actually have users in Mac that use g++ 4.6, 4.9, clang++ 3.0 / 9.0, and some older version of clang++, Windows users that use g++ 4.6, 4.9, and Linux users that use g++ and clang++. As we move into GPU and MPI territory, we have to keep the math library building for all our existing users.

I know the test burden is high for the math library. I’ll try to give some motivation why:

  • We instantiate the code in all the ways that the Stan library could instantiate the Math library. We do this because we’ve encountered many compiler bugs where things that should work don’t. And this might only show up with a particular version or even on a particular OS. I don’t know of any other way to test it other than to instantiate it.
  • To be responsible to other developers and users. We really shouldn’t push code that’s never been exercised in our tests, at least once. You might be good enough to write code that always works, but I’m not. And if it needs to be changed, tests should be there.
  • For maintenance. Whenever we encounter a bug, we fix it by first recreating it. It’s a lot easier adding a test case when a function has been instantiated with the right types than trying to figure out how to instantiate it in the first place. It’s the difference between setting up something in a couple minutes vs taking a couple hours in the future.
  • For stability. We write a lot of unit tests, but we still don’t have much test coverage. More tests allow us to update the codebase quicker while trusting the results.
  • We also check output messages. These actually make it out to the user, so it helps to be consistent and make sure these are ok.

In short, we’re trying to contribute something to the Math library that’s good and will not incur much technical debt. It also helps if the contributor takes time to guide the reviewer by highlighting the design or the major changes.

Code Reviews

Code reviews should be taken seriously and is carried out by developers of the math library. One or more developers have to review pull requests; most often, they might review different parts of the pull request. For example, I might look at the changed build instructions while someone else looks at the template metaprogramming.

When evaluating the code, we should be evaluating it from the point of view of the Math library and defending it. I always assume the contributor has a good reason for putting effort into a pull request – the content of the pull request does something good for the project. The review is a chance for us to evaluate whether that’s true and whether it’s going to be maintainable.

We should be thinking about how easy it is to read the code. Once it’s in, the collective developers have the responsibility of maintaining it. We should think about how hard it would be to replace the code later on. We should be asking if there are easier ways to do things so that it’s less complex. We need to be looking at design. We should also ask if there are enough tests – at least to convince yourself that it’s right (enough) for users to push on all the corners of that code. The litmus test is whether you’d be comfortable hunting down a bug in that code if it were to break.

Code breaks. We’ll deal with it. It’s a matter of how easy or hard it is to deal with later on. I’m never a fan of “we’ll document it later” or “we’ll test it in a future pull request” – that technical debt almost never gets paid off. Even with the best intentions, the individual that contributes code isn’t always the one that fixes it. That’s part of the challenge of open-source software.


#6

P.S. I didn’t think though how this is all going to be stitched together. I just started writing. I’ll try to summarize and get it into wiki + pull request template form.


#7

Thanks for writing this up. I think it’s a good perspective, and I especially appreciate the goals of readability and maintainability.

I’d like to frame this discussion in terms of costs and benefits. All else being equal, more testing is better. But all else isn’t equal. Some of the costs we need to take into account are:

  • time it takes to write the tests delays release of features being tested
  • maintenance of tests as interfaces and other underlying patterns change take time to fix proportional to the size of the test suite (if tests grow linearly and time to fix at big events is proportional to the size of the test suite, overall maintenance cost is quadratic in the size of the test suite)
  • changing rigid tests impedes refactoring
  • user local and CI testing slows down the overall development cycle
  • opportunity cost of not developing other features instead of testing existing features
  • poor test code is hard to maintain, but good test code is more time consuming to write

The right balance for Stan is somewhere between the reliability of autopilot software for airplanes and the usual academic research software, but I’m a little perplexed about how to define where the optimum point is.