Growing PR backlog & autodiff test framework

Our pull request backlog is growing alarmingly, especially in the math lib.

I think the main holdup is continuous integration. While I appreciate that it’s saving us from terrible bugs, it’s a huge obstacle for moving forward with anything.

The first problem is that it’s a very demanding set of tests that people can’t really afford to run locally. We run multiple platforms and compilers on lots of tests, incluing downstream tests with dependent packages. There are at least three big serves in play that devs need to wrangle: Travis, Jenkins, and GitHub. Then there’s cpplint and clang format, both of which are requirements. This is a big hurdle, but I haven’t see any way to cut it down while maintaining both code quality and everyone’s sanity.

The second problem is that we don’t have good general ways to test new autodiff functions. Each one’s a crap shoot as to how much a reviewer is going to demand in the way of testing. The mdivide_right pull request in process now is a good example. The previous version had zero tests of the autodiff. Sebastian wrote a more efficient version that did less copying. Now Daniel wants tests. Fair enough. I tried to write those tests and there are 16 combinations of matrix/vector and double/autodiff arguments for the two functions of two arguments each (one argument has to be a matrix). I started the cut-and-paste war, then backed off into a metaprogram, but that’s such a pain that after two functors and four functions that were nowhere near there yet, I gave up and decided to rant on discourse for awhile instead.

The only way out of this I see is to buckle down and write a serious testing framework for multivariate autodiff. Something that’ll take arbitrary functions, double versions (perhaps with expected results in some cases), and test everything. I’ve done this with the univariate functions and know how to do it with higher-order functions, I just can’t find the few weeks of coding I’d need to do it. Instead, I’m trying to work on things like MPI and the language refactoring.

How are we going to find the time to do this?

In the case of matrix stuff like this, it should be sufficient to do some matrix operations on random input that yield an identity matrix. In which case, the norm of the difference between that and an identity matrix should be numerically zero, and the same is true for the derivatives.

For the math library, I think it’s more the lack of volunteers reviewing and having the original submitter address comments. When I’m back, I’ll try to be much more proactive at triaging the pull requests.

Continuous integration is a part of it, but I don’t think it’s what’s actually holding it back. I’m really happy with the way the continuous integration is moving with @seantalts leading it. I think some of the things that have been rough have been the parts that affect developers that weren’t ironed out before they changed our process. Perhaps we can slow down changes there and wait until kinks are worked out until they are implemented.

I’ll come up with something for the backlog in Math. I’ll just have to do it later this week.

One thing we should do is update the pull request template appropriately. And put better guidance on what needs to be tested before things get in. I’ll put that on me to come up with a draft. (It won’t go into practice until there’s at least some formal or informal review.)

Hopefully we can get more help from more people. Seems like people are willing to develop. We just need to enable them to help out with pull requests. Even small updates to tests and doc without needing to get into the design would go a long way.

Testing changes to Jenkins is pretty difficult - so much of the stuff to be tested involves which branch, fork, computer, compiler, etc. the stuff involved is running on. And I made some assumptions (like that clang-format 5.0 refers to a specific version) that didn’t turn out to be true.

I think going forward, I’ve been trying to test things a little more thoroughly and make things very developer friendly (CI systems in my personal history have expected much more from individual developers), but mostly to be conscientious about cleaning up afterwards and shepherding where some new feature broke a pull request.

Looking to the future, I think the autodiff testing framework is a great idea. Maybe the other takeaway from this thread so far is to make efforts to increase modularity and create different levels of abstraction so developers can work on a minimal subset of code at a time. @Matthijs had thoughts about this lacking in the code base, but aside from the scal/arr/mat refactor I’m not sure what other concrete steps anyone’s proposed to make it easier.

1 Like

The latest change from @seantalts was great in that it makes the errors much easier to debug. It will hopefully free me up from helping people spelunk error messages.

I meant getting things through all the required tests and compiler versions and keeping up with other people’s refactorings and changes. There’s a lot of moving parts to deal with.

I think it would help if it feels less subjective. That’s why @seantalts made a big push to get rid of formatting and style commentary. I’m still not sure what’s OK in terms of style to comment on, but I’m trying to err on the side of getting things in rather than getting things to make me happy in terms of coding style, organization, etc.

Yes, but are they willing to test?

Like what?

Sure—I love modularity just as much as the next person if not more. I’d love to hear any suggestions you or @Matthijs have for how to do that. The scalar/array/matrix thing isn’t really a code modularity issue, is it? That’s just about where to put things in files. The problem now is that changing a function in prim means you need a combinatorial explosion of tests up through mix.

Believe it or not, the scalar vs. array vs. matrix thing was an attempt to make the dependencies more modular so that not everything had to bring the whole world into scope and that it was clear where to put files and dependencies. I’m for getting rid of it, but getting rid of it’s going to be another of those speedbumps on everyone’s outstanding pull requests. Same way as what @mitzimorris is doing now is disrupting the entire parser. I don’t see how to pull out the type system and replace it in a more modular way, but maybe you can sit down with her and/or me and have a look and see if you see anything we’ve been missing. Now would be the time to speak up.

Side note I guess but the autodiff testing framework was great once I figured it out in terms of saving time. Except now I have a failing test that’s due to automatic autodiff testing code with macros and I needed a break from it bc it’s too many layers of auto! I can’t even find the code that’s failing (and I’m pretty sure it’s a Nan == Nan comparison so I think it’s a problem with the testing framework…). …

The only useful suggestion in all this from me is maybe we put a little more work into the testing framework itself so it’s not so opaque when it fails.

P.s.- I’d be able to help out with some of the other pr’s but I let mine languish for so long I’m trying to just focus on finishing the thing.

At some point I thought we had our comparison operators or constructors not quite correct for higher order autodiff. I saw that the autodiff stack was seeing a NaN on there. I meant to hunt it down and file an issue. Will still need to do that.

Good to know I’m not nuts.

Just that the UI / UX is typically pretty bad on developer-facing stuff and devs are used to digging in and knowing the CI systems internals pretty well. I’m sure this isn’t true at the Googles and Facebooks of the world.

I was using the word poorly. I was trying to use modular as “simple” but in the way that Rich Hickey talks about simple here:

I think The Great Flattening will make things simpler by reducing the number of functionally intertwined components. I think it’d be good to keep an eye out for other ways we can reduce intertwining among components in the system (which, as a shorthand, I was referring to as modularity, but I’ll stop doing that because it’s confusing).

I’m happy to look at Mitzi’s code if she wants, she was mentioning that today. But I don’t know that codebase very well yet. And in Math, I wasn’t saying that I have all these great ideas for how to do it better… just that we should be on the lookout for opportunities, even if they’re big refactorings that could cause existing PRs issues. Merge conflicts are annoying but usually fairly trivial compared to everything else we do. We should also aim to keep the number of PRs down as much as we can, but a lot of that relies on other people responding to our requests. Hence me trying to automate the style portion of the process.

Agreed we should have a testing framework that tests different type instantiations. Ideally we’ll rewrite the probability tests to not do code generation at some point and to use the same framework as well.

I plan to get back to it ASAP. It’s holding too many things up now. First up, I’m going to document what’s there.

It just keeps delegating down, so I don’t see why it’d be a problem. There’s an issue with those fvar constructors testing for NaN. It’s inefficient and it shouldn’t be necessary if everything else is organized properly.

That’s why I think it’s a good idea.