Stan Governance

One reason we set the repos up is because stan-dev/stan and stan-dev/math are maintained separately. And the reason we were proposing to split stan further is so that I could manage the language part and @betanalpha could manage the sampler part and we could have a fight over who managed the services.

P.S. Sorry if you caught my cold. It’s taking forever to clear up. I still sound like I did.

I just don’t think the repo-level permissions feature is worth the headache of splitting issues, wikis, history / versioning, etc across so many sub repos. Every one of these needs its own infrastructure and we always have headaches trying to change things, test, and share information across repos.

My opinion: monorepo isn’t right for us.

We don’t have the tooling of Google to help us out. It couples and synchs repos when they don’t need to be. Clearly, there are also good reasons for it, but I think for our organization, it isn’t right.

There are lots of things here:

  • wiki. It’s amazing there isn’t an org-level wiki on github yet. One way we could fix this is to create a org-level wiki and turn off the wiki on each of the repos.
  • issues. I think this is a very strong reason to have separate repos. Yes, we do have certain issues that span across multiple repos (adding a new language feature), but we have plenty of things that need to be fixed on just one project. So I’d rather keep this separate.
  • versioning. Separate is better. There’s gonna be a future when the math library hasn’t changed for a version of Stan.

+1.

I think the biggest, practical reason for not using a monorepo is permissioning. GithHub doesn’t allow us to set permissions at the folder level. We have different processes for the different projects with different testing requirements. I think it’d be easier if we had a shared process or even a minimum level of process that’s adopted.

But anyway, I’m against the idea of a monorepo for Stan (what does that mean exactly? Math, Stan, CmdStan, PyStan, RStan? Do we include RStanArm and BayesPlot? Loo?). We started that way. We changed for good reasons.

You actually need more tooling to support multi-repo stuff. Take a look at Jenkins or the issue copying tool or our submodule headaches… the list goes on. Git and others has a ton of tools that break with separate repos - wiki, git bisect, aligning history and versions across repos, releases, build tools and make files, etc.

Permissioning is the only tool we would gain by using separate repos. Tooling is heavily pro-monorepo in all discussions of this that I’ve seen.

1 Like

Are you suggesting merging math, stan, and cmdstan?

How would these then relate to rstan and pystan?

How would testing work?

How do releases work?

How would a user checkout and commit to a particular piece of the monorepo, like the math lib?

I assume there’s no problem doing this with Git even though Facebook’s touting Mercurial and Google’s touting Perforce (at least in the things from a few years ago I’ve seen about monorepos).

Here’s something more recent about Google’s monorepo:

We’ve totally hijacked this thread. @seantalts, if you’re serious about this, could you open a focused topic.

I don’t have the stamina to actually suggest switching things to a monorepo. For now I’d just like to register the objection to splitting things out further into more repos.

Sorry about the cold. It’s been pretty nasty.

No worries. Also I just discovered that github has subdirectory permissions already built-in: https://help.github.com/articles/about-codeowners/

Given that, I’m not sure there are any further arguments for splitting the repos out, but it’s possible I haven’t been the most diligent in cataloging them.

In the plus column for the monorepo: we lost history in a lot of the files in the Math library. Although, you can chalk that up to a decision made out of convenience. When we decided to split Math from Stan, I looked at how difficult it would be to keep the whole history of Stan in Math; it wasn’t easy. We (group decision) decided it was ok to not have history and so we started a fresh repo at that point.

I’m still not buying the monorepo. It seems harder to deal with testing; we’d have to inspect each pull request to only test certain sub-directories of the code. If you wanted to just test everything every time, we could set that up really easily now, so I don’t see the advantage there.

One key downside to using git for monorepo: there’s no narrow checkout of git repos. By that, I mean that if I only wanted to work on part of the repo, say the math library, then I have to check out the whole repo. There some stackoverflow discussion about this.