Progress

@bob-carpenter: “Speaking of style, I think we have to figure out how to set aside issues of what we think is the perfect design in favor of making progress forward.” (from math issue #547)

I know some of this is reactions against old coding practices that led to hard to maintain code but as long as a PR is moving the code towards compliance with current practices I don’t think we should require contributors to update it all the way. I’d rather see somebody get done with one PR and still have enough steam to do some clean-up in a separate PR, maybe on the issues that surfaced in the code that they contributed or maybe on some other issues.

Agreed. This is what we need to balance.

The real problem is the asymmetry of the relationship. It’s very easy to comment about a whole bunch of things that could be made better. This isn’t just in others’ code. I could do it with my own code, too!

Having code review keeps me more honest than I might otherwise be, but I think we’ve erred on the side of being too picky (myself included).

I also want to balance doing more thorough reviews for the first few pull requests of a new developer to give them a sense of coding style and testing requirements, but given that these are fuzzy and opinions differ among our developers (Daniel and I often give conflicting feedback), I don’t think we should be jerking around our pull requesters!

1 Like

I think we’re all in agreement.

We want contributions to be easy. Especially for new people.
The other concern is to minimize (not eliminate) technical debt.

New features are never free. There’s always some maintenance cost. Once a new bit of code is in the code base, it becomes the responsibility of the team, not the responsibility of the individual contributor. We haven’t been successful at having people that introduce a problem be the one to fix it. This is, in part, due to the volunteer nature of the project.

Having clearer guidelines and less decision making on the part of the review is good. What’s good enough? What stype is appropriate? What should be tested?

When I look at pull requests, I usually look for:

  • existence of tests. Introducing a new feature without demonstrating that it’s been run once isn’t a good show.
  • testing of known boundary conditions. It’s so much easier for the Stan user when things behave reasonably when they break.
  • having just a second person on the dev team sign off on the pull request. If the person that read the code and the person that reviewed the code can understand it, I’ll assume by induction more people can.
  • adhering to style. The contribution should match the rest of the code base. It’s going to be read by others.

For me, if those three things are met, then I’m pretty sure the team can work with the code if it’s broken. Unfortunately, it usually isn’t the case.

If people are trying to help by fixing existing code, then anything better is good.

Good point that new features should have a different standard than updating old code.

So where does this pull request from Sean fit—is it fixing existing code or introducing new code? It’s a refactor which then enables new features.

I mainly want to avoid the whipsaw we get from two people asking for competing changes. We’re turning around quickly on making requested changes, but then reviews come in later wanting to reverse things. I can sense the frustration of all of our contributors on this.

Daniel makes a good point about downstream maintenance. It’s asymmetric in terms of cost—there’s the up front cost and also the maintenance cost. The real question is how much to front load all of this.

And just how exhaustive we have to be on testing. We’re not building flight controllers, but at the same time I don’t want to keep having to apologize that bits of Stan are broken and giving the wrong answers. I’m less bothered by things just crashing Stan and its container—at least people won’t get the wrong answer and think it’s right.

I’m just going to throw out some thoughts:

  • To me if it’s a refactor it counts as improving old code. As long as the design makes sense and it passes the same tests we had or some more it’s ok.

  • If we’re refactoring code that doesn’t have enough tests to be safely refactored then we’re doing this in the wrong order. We should be up front when some of our code is a (functioning) black box and if somebody wants to mess with it, step #1 should be testing. This should be part of the design discussion or documented in the issue itself so somebody doesn’t get part ways in and

  • The conflicting change requests thing is weird to me—you see the other reviewers’ comments when you comment through github so why just contradict them without engaging with the previous comments? I get that some stuff happens off-github but when a contributor says “X told me to do it this way” I think it’s up to the reviewers to resolve what the recommendation should be.

  • I looked at this PR and I think part of the pain is that the name changes happened at the same time as the refactoring so the PR looks huge but most of it is replacing OperandAndPartials with operands_and_partials… we could probably help contributors plan these PR’s better so they’re easier to review.

That’s all good in theory, but in practice things aren’t so clear cut. Again, this particular change we’re talking about was not unusual, but doesn’t fit into the boxes. It refactored the way OperandsAndPartials worked by completely rewriting it as operands_and_partials. The user-facing functions all use this class under the hood and none of their behavior changed. But it completely replaced the old OperandsAndPartials and none of the tests were still relevant. So we couldn’t have started by testing the older thing more thoroughly.

Exactly—I sat down with Sean and went over the code in person so he could help me understand it. In some sense that’s cheating because now I have understanding that didn’t come from the doc or code itself. And it does make the decision-making opaque. But it’s so much more efficient. Occasionally, we do code reviews at the same time then write up results separately and get conflicts. So maybe we could try a lock mechanism to ensure one review at a time?

In some ways since I do everything remotely it’s simpler! :)

Let me note that this seems like another failure of “I’ll figure out what’s going on by looking at the code”. For these changes to the inner workings of Stan I think we need to supply something other than just the raw PR. I don’t know the right way to formalize or share designs but we need something at that level.

+1 the Wiki write-ups as flawed as they are work pretty well, it’s just that usually almost nobody wants to go through the trouble of doing them.