Missing copyright and licensing for stanc3 PRs

For all of our PRs for Stan, we’ve been collecting

  • declaration of copyright holder
  • open-source licensing for code and doc

In the stanc3 repo, there are PRs getting merged without this. For example:

@SGB: I would propose we not release a version of stan using stanc3 until we get clear copyright assignment. I don’t think it’ll be too late to ask all the people who made PRs to add licensing releases to the PRs

@seantalts: can you get all the PRs tagged with licensing by the person who submitted them? I know that’s 400 PRs, but I can’t think of an alternative that’ll clear up our IP.

Obviously, I think this should be licensed as liberally as possible. I was always under the impression that everything we did was BSD-3 because we had a BSD-3 license declaration in the top level folder of the project. I didn’t realize that every single PR needed to be marked as such.

Same - Bob, can you go into the mechanisms here? What do we get by marking each PR individually? Is there case law precedent or other material that you’ve come across that you could link to so that I can learn more?

I was under the same impression as @Matthijs. I went and added the requested info to my 6 PRs anyways.

Top level can determine the license, but not whether the person has right to submit code with that license. Thus each person still needs to tell who owns the copyright for their PRs and then it is assumed that they have checked that they have right to submit that code. It’s quite common that copyright is owned by employer instead of individuals (unless it’s non-work related hobby code). I think it’s good that it’s made explicit and if it is asked for each PR, it is more likely then that the a person will change that if their employer changes.

1 Like

Can we do like some other projects I’ve seen and gather CLAs just once per person?

I looked into it a little more - it looks like CLAs are not a great idea, but that Matthijs and Rok and I had the right idea:

Specifically this section,

For a long time, the basic norm of open source licensing has been inbound = outbound , meaning that a contributor contributes to a project under the terms of its open source license. Think about it logically: if I submit code to be incorporated into an open source project, with the knowledge (constructive or actual) that it will be distributed under a certain license, I must intend that code to be licensed similarly. Sure, there’s still that “well they could argue…” factor, as with digital signatures, but more recently, GitHub made that relationship explicit by incorporating an inbound = outbound clause into its Terms of Service, meaning if a pull request is submitted via GitHub, it is submitted under the project’s open source license.

@Bob_Carpenter does that resolve your concerns?

1 Like

I was assuming inbound = outbound, but that doesn’t help figuring out who owns the copyright

1 Like

From GitHub’s TOS:

Whenever you make a contribution to a repository containing notice of a license, you license your contribution under the same terms, and you agree that you have the right to license your contribution under those terms. If you have a separate agreement to license your contributions under different terms, such as a contributor license agreement, that agreement will supersede.

Isn’t this just how it works already? Yep. This is widely accepted as the norm in the open-source community; it’s commonly referred to by the shorthand “inbound=outbound”. We’re just making it explicit.

(emphasis mine). This resolves the copyright ownership question, I believe.

1 Like

But if my copyright owner is my university I can’t certify that.

Not sure I totally understand what you’re getting at - the only distinction here is that now, by using GitHub and submitting code, you are implicitly agreeing to something that we ask people to manually certify on every PR in some of the other Stan repos. Any issues you have with your university exist in both realms as far as I understand it; it’s the same thing agreed to in different ways. In fact, in the article linked above, they mention why even lightweight manual certification like ours is a bit hostile to developers.

Writing out who owns the copyright of the code submitted provides additional information that is not present through the PR mechanism.

Since the owner may not be the individual and the copyright owner of different PRs for that particular user can change between PRs, it seems like we need to record this?

This information is relevant if there’s ever a need to re-license the code. For the projects that are BSD, I think it’s a lot easier to license without necessarily having permission from every copyright holder (I’m not a lawyer, so please don’t take this as a confident statement), but this isn’t the case for some of the other repos. Note: it’s copyright holder, not GitHub submitter that would need to give permission on a license change.

Cool! Were those terms of use in place for all the PRs in stanc3?

You can by asking the university or making sure you have the right to contribute to open source as part of your employment agreement. Even at Columbia, rights vary by position (research scientists and regular (research) faculty have different obligations than students or teaching faculty).

This has always been a problem for us and other projects. I’m not sure a lot of academics understand that their university usually owns their code. Most universites are OK with open sourcing things and many have moved to making that an explicit right of faculty.

Don’t we still need to know who owns the copyright? It might not be the contributor, like for those of us in research positions at Columbia.

You mean the article inbound = outbound from opensource.com? They don’t say why asking for copyright holder and OS license is intimidating and don’t say anything else about it I can see. Is there some other hostility you’re talking about?

I’m not sure if what we’re doing now in other modules would be considered minimalist or maximalist or somewhere in between by that article. A lot of it’s about GPL and doesn’t seem relevant.

Can we just go and collect a list of who owns copyright from our contributors and also let them know they’re already BSD/CC BY licensing in case they didn’t realize? I hadn’t realized GitHub had such a clause in their terms of service.

I looked up what some other projects were doing on the licensing/copyright front. It’s actually hard to figure it out. From most to least explicit on requirements for conributors:

  • TensorFlow requires a (corporate) CLA

  • PyTorch takes the strategy we do in leaving copyright with individual contributors; some of its modules require a Facebook CLA, but its top-level instructions say nothing about licensing other than that it’s licensed with copyright to the individual contributors with some listed specific side cases.

  • The Boost.org policy is to only accept libraries with a clear copyright notice and meeting the License requirements

  • Eigen’s contributing page is unclear on licensing or coyright terms for contributors; the project is clear the code’s MPL-ed.

  • PyMC3 doesn’t say anything about licensing in its contributors notes, but like PyTorch, lists copyright as belonging to individual contributors and is hosted on GitHub so maybe they’re just assuming GitHub terms of service

  • Scikit Learn’s contributing page also doesn’t mention copyright, so presumably they just go with GitHub’s terms of service; their copyright notice says copyright belongs to the developers

2 Likes

Great point.

Yep, https://github.com/github/opensource.guide/pull/340.

I was talking about from this article: https://ben.balter.com/2018/01/02/why-you-probably-shouldnt-add-a-cla-to-your-open-source-project/ But I think actually the Stan system is not really hostile like a full CLA.

Sure, this works for me. Would it also work to just ask them in the future if we needed to relicense? I have no idea, but I wonder how much effort we should put into preparing for relicensing ahead of time - is it a somewhat likely event at some point?

No, I don’t think we’ll be changing licenses unless there’s some kind of legal upheaval.

That’s a huge relief.

1 Like