I think you’ve identified where we have differing opinions. I never saw this as something that we could turn into a sensible policy, so it was never a “need” but a “want.” It’s pretty easy to determine whether a function is single purpose, but it is really hard to determine what’s big or small for code. If this were in our policy, I don’t know where we’d stop in terms of splitting functionality apart.
My opinion has been to let the contributor decide on the implementation details conditioned on:
- all functions must be tested
- all functions must be documented.
Practically, it’s easier to test a complex function if it’s split into multiple parts. But as you’re pointing out, multiple parts means more overhead. So it’s a balancing act… and that’s what I thought was reasonable and that’s what I wanted to encourage.
I don’t have a desire for every function to be the smallest possible. And I don’t have a desire for there to only be external-facing functions. What I do want is for every contribution to the codebase being maintainable going forward as a policy. That’s sort of my ideal precondition. If all code in the codebase is maintainable and every contribution is maintainable, we continue to have a maintainable codebase. If we give that up, I see us getting further from that and ending up with more and more technical debt. Enough where we can’t maintain the library anymore.
That’s definitely useful. But I think we should also list what we think are fundamental tenants that we believe and what our standard is. Yes, there are generally accepted programming practices, but not every contributor knows them and there are conflicting practices depending on what field or language is your primary one.
I also don’t want a novel. There has to be some middle ground. Maybe a preface to this policy that lists 5 major rules or principles. Or at least links to which guidelines we follow more closely than others.
Thank you for asking these. Here’s where I thought this was clear, but it’s becoming clear that it’s not.
- Yes, test all functions you touched during a refactor. (If there were already tests in place, this is easy.)
- No, not all boundary conditions and not all exception conditions. (I’m reading this as essentially branch coverage for every boundary condition; if you meant something different, please let me know.) Ideally, yes, but practically, this is really hard to test. To obtain maintainable code there needs to be the existence of tests. More specifically, there has to be at least one test that instantiates the function in a stand-alone way. If it’s really difficult to do this, then the design can probably be improved. If it’s difficult and this test doesn’t exist, then it’s going to be even harder for another person to try to instantiate the function when there’s a bug. Maintainability isn’t the only goal in the policy; we have to think about correctness too. So from that point of view, there has to be at least one test that demonstrates that the code does what it’s supposed to do. From my mind, that’s testing that it works with a good case and making sure it triggers any exceptional cases. If boundary conditions are important, then those should be tested too, but all of this is now a judgement call in my book once you’re past tests for making sure it does what the function is advertised to do. (Btw, we failed to even test that functions were instantiable early on and we used to have users run into compiler errors when they wrote Stan programs.)
- “Does it need to describe the algorithms and race conditions for threading?” This is really a good question. Hopefully you’ll agree that prior to threading being introduced, we didn’t need to worry about this at all. When threading was introduced, it really didn’t come into the library for free; we now have to worry about race conditions. So yes, new contributions should come with doc that describes race conditions for threading; it must if it has a race condition. Ideally, we would have made an effort at the time threading was introduced, but we didn’t. Threading is now in, so this means we have to consider it with every contribution going forward. At the time the threading PRs were introduced, I tried to make it clear that we are introducing this burden on every contribution forward and that we should be taking the effort to identify race conditions. But it’s tricky… really tricky. I’m sure new contributions have race conditions we haven’t triggered.
- We never wrote a policy for that. We never enforced it. So no, but it was always appreciated. If we want that to be required, then we need to make it policy so that contributors and reviewers know it’s required, but until that point, it’s optional. In order to make improvements to the codebase like this, we do need to set it as policy, enforce it with new contributions as it enters, and simultaneously open up issues for existing code to fix it, and of course, pay down that “technical debt” with work.
Does what I wrote above make sense? I thought “all functions must be tested and all functions must be documented” actually was something that’s enforceable, clear, and balances vast, complicated functions vs smaller functions. But… I’m sure I didn’t communicate that clearly enough.
One other thing: it would be nice to codify policy into a PR checklist for the reviewer to go through. When it fails, it should just be kicked back to the contributor. (The same checklist should also be available to the contributor prior to submitting.)