Level of Review Necessary for (math) PRs

Following on from this discussion on math issue #1321, @syclik asked me to formulate what I’d like to see in a necessary review policy. I think this is fairly simple, so I’m curious as to what level of detail’s going to be necessary here.

This policy is only complicated by the fact that there are two notions of external: external as in used by the Stan language and external as in used by other parts of the math lib or developers looking to extend the math lib.

Review policy

  1. All functions in stan::math and not in internal namespaces will be tested for all boundary conditions in a way that exercises all of the underlying code.

  2. Functions in internal namespaces can be used within files without documentation, but if they are used by multiple internal files, they need documentation and tests.

3 Likes

Thanks for writing this up! I think it’s a good start. The people at the Math meeting this month (@stevebronder, @rok_cesnovar, @wds15, me) started talking about it in person.

That’s a great question. I think it needs to get down to what we want to use the policy for:

  • overall, they will be rules that if accomplished, will result in PRs that benefit the Math library. Since this is a policy for code, this is what properties we expect the codebase to have
  • for the developer, it indicates what’s expected on submission.
  • for the reviewer, it indicates what to check for to keep the codebase up to standards.

Here are some of the things we discussed:

  • basic documentation for internal functions is useful. Even if it’s not full doxygen documentation. Just something to explain its use.
  • it’d help to have a general statement saying that it’s preferred not to have new internal functions when there are existing functions. Without that, we aren’t encouraging people to not use ex
  • naming conventions for internal functions. Since this proposed policy puts internal functions inside the files they’re used in, we can run into naming conflicts. And also not knowing that other functions exist with different spelling, casing, etc. So I think we need to provide guidance on what to call these functions.

Before trying to update the draft, maybe we can discuss whether these things are even important to us?

I’d prefer to see short, well-named functions. But a little doc note can sometimes help. I wouldn’t want to discourage writing more API doc!

That’s just a general programming rule. We also want them to favor using standard library, Boost, or Eigen functions over implementing their own.

I don’t see how this is any different than for non-internal functions. The compiler will catch violations of the one-definition-rule in translation units. I’m not sure if the linker will find errors of mismatched definitions across translation units, but then it’s not so important that it does if they stick to use in file.

I think I’m just looking ahead a little bit. Although these are general programming rules that some feel are good, if it’s not part of the policy, it’s actually open to interpretation. Given the history of developers that come to the project, not all have backgrounds in software, so I think providing them with clearer guidance it’s important.

Yes. When we realized this, we stopped treating anonymous namespaces as internal functions. If we don’t have something written about this, I’m guessing we’ll have PRs that run into a conflict and rather than a contributor dealing with having to extend the function, they’ll do something like tack a 2 after the function or overload the function with different arguments or some other creative way to get around the naming conflict. This then puts the responsibility on the reviewer to not only catch this, but also inform the submitter that we have an unwritten policy about this. This is what I’m hoping to limit; it’d be much better if we can just point back to the policy that is more concrete about this so it doesn’t seem like the reviewer is making up rules about duplicate functions.

Btw, I’m not opposed to changing the level of review necessary. I just want to be explicit about what we expect; it’s supposed to be both informational to the submitter and the reviewer. It’s the standard that we try to maintain. As a policy, I think we need to be a little more complete. (The existing one is “all functions must be tested and all functions must be documented.”)

I think it would be simpler to discuss where our philosophy differs from generally accepted programming practices like “use library functions rather than writing your own.” The problem with writing our own how-to-code guide is that we don’t know where to stop because the set of such practices a newbie needs to learn is huge. Here’s Sutter and Stroustroup giving it a go for C++:

It’s a gazillion pages long, but doesn’t go as deep as Meyers’s books on the topics it covers. It includes “prefer library functions over writing your own,” among literally hundreds of other pieces of advice at the same level.

This is another basic rule of programming—don’t repeat yourself as The Pragmatic Programmer put it. I hardly see how someone can think we’re making stuff up when it’s in almost every intro to programming style. But my bigger point is that we’ll wind up with a novel if we want to start legislating all this stuff explicitly. Certainly somethng as big as those CppCoreGuidelines or Google’s style guide.

That’s still vague. All functions you touched during a refactor? Does it need to include all boundary and exception conditions? Does it need to describe the algorithms and race conditions for threading? Do we need the code to match so that we have noexcept on every function that doesn’t throw an exception and const everywhere it’s possible?

But most critically, do functions need to be small and single purpose rather than vast? That’s what we want to stop, but our rules for testing actually discourage that.

1 Like

I think you’ve identified where we have differing opinions. I never saw this as something that we could turn into a sensible policy, so it was never a “need” but a “want.” It’s pretty easy to determine whether a function is single purpose, but it is really hard to determine what’s big or small for code. If this were in our policy, I don’t know where we’d stop in terms of splitting functionality apart.

My opinion has been to let the contributor decide on the implementation details conditioned on:

  1. all functions must be tested
  2. all functions must be documented.

Practically, it’s easier to test a complex function if it’s split into multiple parts. But as you’re pointing out, multiple parts means more overhead. So it’s a balancing act… and that’s what I thought was reasonable and that’s what I wanted to encourage.

I don’t have a desire for every function to be the smallest possible. And I don’t have a desire for there to only be external-facing functions. What I do want is for every contribution to the codebase being maintainable going forward as a policy. That’s sort of my ideal precondition. If all code in the codebase is maintainable and every contribution is maintainable, we continue to have a maintainable codebase. If we give that up, I see us getting further from that and ending up with more and more technical debt. Enough where we can’t maintain the library anymore.

That’s definitely useful. But I think we should also list what we think are fundamental tenants that we believe and what our standard is. Yes, there are generally accepted programming practices, but not every contributor knows them and there are conflicting practices depending on what field or language is your primary one.

I also don’t want a novel. There has to be some middle ground. Maybe a preface to this policy that lists 5 major rules or principles. Or at least links to which guidelines we follow more closely than others.

Thank you for asking these. Here’s where I thought this was clear, but it’s becoming clear that it’s not.

  1. Yes, test all functions you touched during a refactor. (If there were already tests in place, this is easy.)
  2. No, not all boundary conditions and not all exception conditions. (I’m reading this as essentially branch coverage for every boundary condition; if you meant something different, please let me know.) Ideally, yes, but practically, this is really hard to test. To obtain maintainable code there needs to be the existence of tests. More specifically, there has to be at least one test that instantiates the function in a stand-alone way. If it’s really difficult to do this, then the design can probably be improved. If it’s difficult and this test doesn’t exist, then it’s going to be even harder for another person to try to instantiate the function when there’s a bug. Maintainability isn’t the only goal in the policy; we have to think about correctness too. So from that point of view, there has to be at least one test that demonstrates that the code does what it’s supposed to do. From my mind, that’s testing that it works with a good case and making sure it triggers any exceptional cases. If boundary conditions are important, then those should be tested too, but all of this is now a judgement call in my book once you’re past tests for making sure it does what the function is advertised to do. (Btw, we failed to even test that functions were instantiable early on and we used to have users run into compiler errors when they wrote Stan programs.)
  3. “Does it need to describe the algorithms and race conditions for threading?” This is really a good question. Hopefully you’ll agree that prior to threading being introduced, we didn’t need to worry about this at all. When threading was introduced, it really didn’t come into the library for free; we now have to worry about race conditions. So yes, new contributions should come with doc that describes race conditions for threading; it must if it has a race condition. Ideally, we would have made an effort at the time threading was introduced, but we didn’t. Threading is now in, so this means we have to consider it with every contribution going forward. At the time the threading PRs were introduced, I tried to make it clear that we are introducing this burden on every contribution forward and that we should be taking the effort to identify race conditions. But it’s tricky… really tricky. I’m sure new contributions have race conditions we haven’t triggered.
  4. We never wrote a policy for that. We never enforced it. So no, but it was always appreciated. If we want that to be required, then we need to make it policy so that contributors and reviewers know it’s required, but until that point, it’s optional. In order to make improvements to the codebase like this, we do need to set it as policy, enforce it with new contributions as it enters, and simultaneously open up issues for existing code to fix it, and of course, pay down that “technical debt” with work.

Does what I wrote above make sense? I thought “all functions must be tested and all functions must be documented” actually was something that’s enforceable, clear, and balances vast, complicated functions vs smaller functions. But… I’m sure I didn’t communicate that clearly enough.

One other thing: it would be nice to codify policy into a PR checklist for the reviewer to go through. When it fails, it should just be kicked back to the contributor. (The same checklist should also be available to the contributor prior to submitting.)

I just put up a first go at a contributing.md and code of conduct over on the math repo

@seantalts guessing as SGB chair you’ll want to look over this as well. It may be good to add to the Stan and other repos as well.

Thanks! I’m not the SGB chair but I took a quick look - it was hard to tell what changed because the file moved.

Apologies but ty! Would it be easier to see if I moved the file back during review? Nothing was deleted from the original contributor guide I just added some sections

That would be easier. Also why did you move it? They’re part of the github PR and issue templates: https://github.blog/2016-02-17-issue-and-pull-request-templates/

Ah! I actually didn’t know putting them there would let github know about them

https://help.github.com/en/articles/setting-guidelines-for-repository-contributors

Yeah sure we can put both of those in the .github folder. I wanted them at the front because, tbh, I never saw the contributing.md haha. Maybe just putting a link to it in the readme.md would suffice

I guess they’re technically different documents… anyway, I don’t feel strongly as long as people still get some PR and issue templates. Maybe those just link out to the main one. And it’d be like 0.1% nicer to either move and then review new changes or review and then move, but not both at once.

¯\_(ツ)_/¯

I’ll just move it over tmrw

Yes, I think that pretty much all makes sense. I still think it’s going to be hard to start laying down a small bunch of C++ rules—there are just so many of them in play.

I’m trying to get to pretty full coverage with the autodiff tests. What I’m seeing is that our domain error handling and boundary condition handling are all over the place. This is evident in the current inverse PR.

1 Like

Just looping back. Since we didn’t come to an understanding of a different policy to adopt, do we stick with what we currently have?

  1. all functions must be tested
  2. all functions must be documented

I know. It’s hard. But we have contributions from a lot of contributors with different backgrounds. Even those that use C++ day-to-day might have vastly different standards; I think it’s good to indicate what C++ rules we care about, even if they’re really small rules, just so it’s clear. It doesn’t have to be in our policy doc, but it should be somewhere.

cc: @stevebronder @rok_cesnovar

I’m afraid so. I don’t really want to spend a lot of time working on fine-grained policy details.