The Flattening Refactor

syclik · April 18, 2019, 3:39pm

Makes sense… What would help is for the whole process to be described in a little more detail, especially how it’s going to affect the everyone (the devs working on this, the reviewers, and the other devs). It’ll be a little easier to help, know how to deal with conflicts, understand timing, etc.

If it’s a little premature now to make that plan, that’s ok.

anon79882417 · April 18, 2019, 9:01pm

@syclik sure, it’s not too premature.

So for devs + reviewers:

It’s easiest for @roualdes and I with this back and forth, but I would appreciate some feedback. We’re trying to be as thorough as possible with the git commit history, so that it’s easy to track what each change was. We can figure out what’s being automated with the flattening script, or searching Edward’s name as commit author.

What needs a closer look is what I’ve had to fix afterward, and I’ve tried to indicate these edits with the git history. Moving forward I’ll put a unique id: script_exception so they’re easier to search for in git commit messages (didn’t do this with prim).

Mostly we’ve been pretty fortunate and these changes weren’t so bad. Here are some hashes on the branch code-cleanup/issue-937-flatten.

This commit is something that’s straightforward:

2d3656859f90921afa926f7af1d9e99dd9e06b33

and here’s some examples where I’m less sure that my changes are valid:

7508f11acbe72000b99f0e7868409f89e516aefd
03746296075ef397f8cec22b97e3071d0285f351
8f33a67a6aa6ba1dcbd12527389ae865005b70c2

I can easily revert all these as they’re disjoint and changes only span one file.

Any thoughts? There’s really only around 10 edits so far after the massive flatten.

~~
As for timing, it’s as fast as we can (together) make sure these changes are valid and make these changes to rev, mix fwd. I’m anticipating these being less trivial.

syclik · April 23, 2019, 4:42am

That’s nice.

Can you point me to exactly what I should look at? This much code flying around makes it hard to navigate.

anon79882417 · April 24, 2019, 12:27am

Ok - so we’re clear, the script worked well for the most part, there’s just a few times I’d have to go in an edit a few lines to make sure it will compile.

Yeah, this wasn’t clear. I’ll be more explicit.

7508f11acbe72000b99f0e7868409f89e516aefd
promote_elements.hpp
This commit had tests that included some var things, which was causing prim’s promote_elements.hpp test to fail. So I removed the tests from prim and moved them to var.

03746296075ef397f8cec22b97e3071d0285f351
is_var_or_arithmetic.hpp
One of the tests started failing after the flattening. I looked at the source for is_var_or... and according to the doxygen, I think the failure was valid, so I changed two of the assertions to EXPECT_TRUE. The ones I changed are on lines 14 and 17.

8f33a67a6aa6ba1dcbd12527389ae865005b70c2
prim.hpp
After we cat’ed the higher includes together, there was a declarative #ifndef in the middle of prim.hpp, so I removed it so that it would compile.

All the other commits were really straightforward, pretty much just editing includes at the top of the files to make the tests compile.

If any of that’s not clear or incorrect, just let me know and I’ll fix it up.

syclik · April 24, 2019, 3:06am

Thanks. Wow, so this is going to be tough. I’m looking at this:

and it’s a bit much to digest. There are also 2471 commits! Are the custom changes only at the end?

anon79882417 · April 24, 2019, 3:10am

Yes! custom changes are also all by me, so you can filter git commits by the committer, to see what I’ve done. There’s really only around 10 so far

syclik · April 24, 2019, 3:19am

That’s great! Thanks for keeping things organized.

roualdes · April 24, 2019, 3:23am

@anon79882417, would you mind double checking this commit?

I think the #ifdef inside prim.hpp is supposed to be there, re math/prim/arr.hpp, albeit the script didn’t get the placement of the #endif correct. Thanks.

anon79882417 · April 24, 2019, 3:53am

[quote=“roualdes, post:28, topic:7932”]
@drezap, would you mind double checking this commit?

yep, will do.

Yep, that helps, I’ll drop it in the right place. Thanks.

Bob_Carpenter · April 25, 2019, 8:51pm

@syclik — it’d help if you provided guidance on what you want here in the way of a process spec.

Are there specific existing PRs you’re worried about?
Are you looking for some kind of testing plan other than the existing unit tests all passing after being moved?
Are we OK just assuming external projects that depend on Stan math will pull in top-level includes? If so, nothing should change from their perspective.
Anything else?

What happens if I have a file I edited and committed on a branch, then someone else moves that file and merges with develop? If I pull that merge, will it just move my file? If so, there’s really not much to worry about.

syclik · May 2, 2019, 2:03pm

I was actively trying to avoid being prescriptive about what is necessary; I’ve been communicating with @anon79882417 about it, but I certainly could have been clearer about it. I’m expecting some plan to be made before executing it all. I don’t know what that plan would be or how detailed it needs to be for it to be enough communication to the rest of the dev team to know what’s going on. All I know is that it will be very difficult if we have a PR of 3000+ changed files that are all legit Math source files. I don’t know how to review that. Doing it all at once will be a mix files being moved together, different includes, and moved tests, so it’ll be a lot of careful review. If there’s any way to split it apart so the review is easier, that’d be a win.

Yes. All existing PRs. And all the work that’s happening now.

Right now, we know that we don’t have full coverage, so the unit tests are the best we have. But maybe some sanity check like counting the number of tests before and after?

That’s right.

In most cases, no. It’ll just be merge conflicts everywhere. This is why I was hoping there’d be a clever way to stage it into smaller chunks that we can roll forward in different stages. (I’d really look at Martin Fowler’s “Refactoring” for hints on how to do stuff like this safely… most of that is written for software changes, but we can also apply some of those techniques to file system changes.)

@anon79882417 and @roualdes: maybe I can suggest some concrete steps:

What if we introduced some headers: stan/math/prim/meta.hpp, stan/math/rev/meta.hpp, stan/math/fwd/meta.hpp, and stan/math/mix/meta.hpp? This would just include the template meta programming traits included by stan/math/*/mat.hpp in the right order (this is techncially some of the hardest parts of the Math C++). And in the same PR, replacing any calls to stan/math/*/meta/*.hpp with those headers. No other changes in this first PR.
Next, merge the meta/* folders together. We should be able to squash everything safely now and only have to change the stan/math/*/meta.hpp files now. If we wanted to split out the header includes, we could do that safely as a different file.

By doing it that way, we would be able to track all the changes to all files outside of meta easily and then we’d have one PR that essentially will cause conflicts for everything in meta, but nothing outside of it.

anon79882417 · May 2, 2019, 2:23pm

@Bob_Carpenter @syclik @roualdes will you be in the stan meeting? I can make it and we can discuss

roualdes · May 2, 2019, 3:22pm

I’ve never attended one, so I don’t know how to join in the fun. Either way, I’m late now, and I don’t want to be rude.

anon79882417 · May 2, 2019, 4:57pm

Hey @roualdes

Hey so @syclik and I talked. Prim works: It compiles, tests pass.

We think it would be best to get this reviewed and merged first. So is it too much to ask to roll back the rev, fwd and mix changes? We can then begin the review process for prim, and other developers can just use the new flattened structure for PRs moving forward.

We might need to do the flattening for rev, mix and fwd in another script. There are some caveats to this flattening. I need to do a bit more fiddling with the library to figure out where we’re getting implicit instantiations, etc.

Edit:

If it’s too much, I can do so via a git reset or something. Actually, this is trivial I got it. I just need to find the commit. I can do this soon

roualdes · May 2, 2019, 6:16pm

This shouldn’t be too bad to revert. All of the commits are sorted in our favor here. So something like

git revert --no-edit --no-commit 722f1be704^..cbbf8d7fcc

will not ask you to edit anything commit messages and allow you to write one commit for all reverts. Double check the range. You don’t mind tackling this?

Bob_Carpenter · May 2, 2019, 7:19pm

I think it’s better to be prescriptive proactively than retroactively.

What plan do you need beyond “we’re going to flatten prim/arr/mat and make sure tests still pass”?

At the functional level. We make sure none of the tests have been removed and we’re done, right? I don’t think this requires anything to be done to review the main source file contents and the only test review that needs to be done is make sure we haven’t dropped tests.

It’ll be easy to review that nothing got dropped, right? But yes, if that’s hard, then just making sure we have all the same tests (that is, find them all, sort them, and check identity at the file name level).

I think that’d be worse than just ripping the bandage off all at once.

Would or will? I love that book, but don’t recall anythnig about moving files around. Some IDEs support refactoring like this, but I don’t know that it’s a big deal to move source files in C++ other than for includes. But if any of the includes break, we’ll know it because things won’t compile.

anon79882417 · May 4, 2019, 12:37am

Hey @roualdes,

Asking you this because of your awesome git skills. I’m doing the command you suggested, back to the commit where we had prim compiling and tests passing.

Git gets confused, and rightfully so, when we have a merge, because I haven’t specified the -M option. When we hit the fork in the revert, I’m assuming we should do git -M 1 because we want to stay on this fork (and not revert through develop).

When I hit this fork, can I assume that the HEAD (or pointer, as I’m thinking about it) is at the commit before the commit, and I can just use one git revert -M 1 and continue the reverts for revert n-1, n-2, … 2, 1?

Does this make anysense?

This issue explains it a lot better: Revert a faulty merge

The naive me wants to do a hard reset but then I can’t push this :)

EDIT: Yeah I just reset --hard, it was much easier.

anon79882417 · May 15, 2019, 2:18pm

@rok_cesnovar I see you’ve been adding tests for programs prim/meta that didn’t have any. I’ve come across some meta things that don’t have tests, such as: as_array_or_scalar.hpp, as_column_vector_or_scalar.hpp, as_scalar.hpp.

Please leave writing tests for these to me. I don’t want to flatten without tests and risk ruining the library

rok_cesnovar · May 15, 2019, 2:58pm

Sure. Go ahead. I forgot about these 3 that were added later. Other than those, the prim/scal/scalar_type_pre is the only missing test in */meta. This efforts are part of this issue https://github.com/stan-dev/math/issues/1122

And thanks!

anon79882417 · May 15, 2019, 3:00pm

If anyone, like me, needs better hobbies and wants to follow along, here’s what I’m working on.

This is after convo w/ Syclik.

We want to use a Seam implementation strategy for parts of this refactor. In summary, Seam implementation strategy introduces new programs to make code more modular, easier to maintain and test. See Feathers’s Working Effectively with Legacy Code Chapter 4 for a more detailed discussion. I think the intro of that chapter also hits on some interesting pedagogical points. Anyway, this is handy for code that is dependent heavy, such as with our template meta programs, everything in stan/...../meta.

What we’re going to do with meta in the flatten is a bit different than the rest of the library. We’ll retain the separate higher level include for meta, named meta.hpp. Files need to be combined in a separate order for meta, that is scal, mat then arr, and I’ll retain the separate higher-level include file for meta.hpp.

Topic		Replies	Views
Flatten shape and autodiff type in stan-dev/math down to one file per function Developers complex-math	7	743	November 30, 2019
Remove tests for row vectors (from the distribution tests within the math library) Developers maintenance	25	1690	November 2, 2017
Barriers to New Developers Developers	42	3205	December 16, 2017
Include order of headers in the Math library Developers maintenance , math	10	1295	November 19, 2017
Standard for reviewing pull requests in the Math library Developers	16	1482	June 2, 2018

The Flattening Refactor

Related topics