Git Collaboration Model

Hey all,

I want to propose some changes to our collaboration model. I’m working on another broad proposal cribbed largely from the Producing OSS book around how to induct new maintainers, but for the moment I have a quick proposal to make our lives easier.

We have been asking contributors to put their patches on a branch from our main repos before we consider their pull request, I think due entirely to some historical technical difficulties around how to use git to collaborate with people who are working on a fork. That requires someone to add the contributor to the Stan organization on github and give them a broad array of permissions across our repos for what is in many cases their first pull request. I think we can avoid this and would propose that we use other criteria to induct new members to the organization (proposal to follow separately in the coming days).

I have fixed Jenkins so that it will build and test pull requests from forks, so that part shouldn’t be an issue anymore.

To locally pull someone’s fork’s branch into your own, you can just pull from it into your current branch:
git pull https://github.com/COLLABORATOR_USERNAME/REPOSITORY.git BRANCH_NAME

If you want to make it slightly easier to work with this person’s fork, you can give their fork a name:
git remote add FORK_NAME https://github.com/COLLABORATOR_USERNAME/REPOSITORY.git

Then the pull, checkout, merge, etc commands will all work with the FORK_NAME above:
git pull FORK_NAME BRANCH_NAME

@Bob_Carpenter and @syclik, how does this sound? Can we move to this system? I’m more than happy to help you guys with the technical details around git and whatever other issues we have as I think that will be more sustainable long-term than the current system.

1 Like

@seantalts, I’m all for it, but it doesn’t solve the actual problem. I’ll describe the problem and hopefully we can find a solution around it.

The reason for having contributors add to a branch was to make permissions easier. Almost all contributions to the math and stan libraries from external developers needed (and still need) a lot of core developer help to get it to be incorporated into the library. The help usually came from myself or @Bob_Carpenter , but occasionally help would come from @bgoodri, @rtrangucci, @randommm, or someone else on the core team. When the branch is on stan-dev, we all have permissions to make changes there directly. When the branch is on a fork, individuals must be given permissions to work on the fork (unless permissioning has changed with GitHub recently).

Now, with reviews for GitHub pull requests, this might be less of an issue. We used to have to indicate what file and line number to make changes. Now we can just mark up the pull request. But, it has always been much easier to go onto branches and fix minor things as it comes up. Things like typos, cpplint, and doxygen. In the past, I’ve taken the proactive approach of helping new and existing developers get their pull requests up to speed so we can get things merged in. I understand that things like style can be very confusing and feels like an arbitrary set of rules until you read enough. For new contributors, I didn’t want this to be the issue for not being able to contribute. Without helping out, I think we would have lost a few pull requests.

So, the problem is the balance between:

  • permissions for having existing core developers help with branches.
    • Having a pull request come from a branch on stan-dev means any one of us can help it along, but requires a little more administration.
    • Having a pull request come from a fork means that it’s easier to submit, but if any one of us needs help out, there’s an administrative burden placed on both the team member and the submitter. Either the submitter can give permissions to the individual team member or the team member needs to submit a pull request, which requires branching from the fork, then going through the process of submitting a pull request from that branch to the fork, then the submitter needs to review and merge the pull request.
  • how much the team helps with the pull request
    • By putting in on a branch under a stan-dev repo, we can help it along. From the past, this has been absolutely necessary to get new contributors, at least to Math and Stan. Right now, we’re getting contributors that are statisticians and scientists adding features that help them with their projects, so typically C++ isn’t their first language. We are not getting C++ experts (with the exception of maybe @vincent-picaud and @seantalts) to contribute solid code that adheres to standards.
    • By having a pull request on a fork, the easiest thing we can do as a team is to review the code and point out problems. It’s not natural to provide help and support through pull request reviews, but it’s not impossible–it just takes a lot more effort on the part of the core developer that’s reviewing. I find it easier to help new developers along and not let cpplint be the limiting factor to submitting a pull request, but we could get the opinion of newer developers to know whether the direct help is even useful.

Regarding Jenkins: we always had a workaround. It’s essentially adding the sha from the fork to a branch on stan-dev and that’s enough to get all the upstream tests working.

All that said, I’d rather make things easier for the contributor. We also don’t get that many new contributors, but our heavy-ish process can’t be helping. Of course, being written in heavily templated C++ can’t help either.

Thoughts?

I see, I didn’t explicitly address the case where you want to edit the branch before merging or go back and forth with them.

Here’s how to pull their changes onto a branch (this preserves authorship and history at the commit level):

git checkout -b some_branch #whatever you want the new branch to be called, could be the same as their name
git pull https://github.com/COLLABORATOR_USERNAME/REPOSITORY.git BRANCH_NAME
<edit files>
git commit 
git push origin some_branch

Now you can make a pull request with this branch. Alternatively, if desired the collaborator can pull these changes into their branch:

git checkout BRANCH_NAME #whatever branch they were working on before
git pull https://github.com/stan-dev/REPOSITORY.git some_branch
<edit files>
git push origin BRANCH_NAME

Then this updates their pull request with your commits and their new ones.

I think this addresses that use-case, no? Git is designed for this sort of decentralized collaboration and I think this use-case is relatively common so it shouldn’t be too hard for new collaborators. We can copy parts of this thread into a wiki too if we end up getting questions about it.

I’ve always been good with forks and submitting pull requests back. Git does handle this reasonably well as long as you know from and to and can keep track. Once again, I’m all for it, but I don’t think it makes the overall effort easier.

So old (git branch on stan-dev):

  • someone gives new contributor permissions
  • branch from stan-dev, pull request made
  • dev team is able to help, pull request updated automatically

New suggestion:

  • new contributor just creates a pull request from the fork
  • in order for a dev to directly help, dev must check out fork, make changes, then either create a pull request back to the contributor or communicate with the contributor that there’s a branch for them to merge
  • only on merge is the pull request updated automatically
  • if another dev wants to help, needs to repeat. if there are additional minor changes to be made by a dev, needs to go through a pull request step again.

Yes, but it introduces confusion on the part of existing devs. It would be a non-issue if pull requests were pretty good on the projects with the heavy process, but they’re usually far from it.

sorry – it’s late and it may not have come off as positive as I wanted.

@seantalts, you should do it.

This sort of git usage will be beyond a lot of members of our team, but it will make it easier for new contributors to try. We should go for it since the help for new devs is coming from a concentrated group within the team and I think they will be able to do this easily.

Whatever makes your life easier is fine with me as long as you can help me understand this wall of Git.

1 Like

They don’t have to clone the fork, they just check it out using the same number of git commands that they’d use to check out a branch the person had pushed to the core stan-dev repo; just with an extra argument indicating which fork (or “remote” as git calls it) it’s from. I think it basically amounts to adding an extra argument to some of your existing git commands. And possibly giving that fork a name, if you work with it enough that that’s useful.

1 Like

That works for me. I’m happy making your proposal the default from contributors.

Okay. Well this is something that’s easy to reverse, so maybe let’s try it out for a while.

1 Like