RFC: Return of the Monorepo

I don’t know if we’ve discussed, but does it make sense to just have the monorepo for stan-dev/stan and stan-dev/math? That way CmdStan, RStan, and PyStan are treated the same.

Yes, we discussed that.

I’d be more inclined to bring RStan and PyStan into the monorepo at this point. The separate repos have been nothing but a pain for testing with upstream dependencies.

1 Like

With PyStan 3 (httpstan, technically), I’ve stopped using a submodule
for stan and just committed the relevant Stan release tarball into the
tree. It works fine so far.

How do you keep up with the develop branch of Stan that way?

When there’s a new release of Stan that requires changes the relevant
changes are made at the same time as the new Stan code is added to the tree.

The mono repo is great, but I think we should only include the cmdstan interface, because

  1. we should have a frontend in order to test all our layers … including the top user-facing interface
  2. cmdstan is very light in terms of additional dependencies to have
  3. cmdstan is to me the “vanilla” stan - and it’s a good thing to have a reference

However, if we include cmdstan in the mono-repo - how can rstan and pystan make this mono-repo a subbmodule without the cmdstan stuff?

If that’s only official releases, does that mean you don’t keep up with the develop branch on stan-dev/stan? I guess that’s pretty stable.

That’s the plan to start. CmdStan is lightweight enough I don’t see a problem bringing that in along with the submodules in R and Python. Sounds like Allen isn’t even using submodules for PyStan.

Right, PyStan does not really need to update the Stan source code
between releases. If there is a new Stan feature that requires (early)
testing with development Stan code, I think we’d just do this in a
separate branch.

Thanks, @wds15. I have similar concerns.

I don’t want to elevate CmdStan above RStan and PyStan (or demote it for that matter). I think they should be on equal footing. I’d rather they were all in or all out. I treat the interfaces that use CmdStan as a different class of interface since they don’t write any C++ code directly.

The only things that could break at the CmdStan level when updating Stan happen rarely:

  1. changes to build instructions
  2. API changes

The behavior of the samplers are tested at the Stan level and should be kept there.

In my mind, it’s easier if Stan and Math were merged together leaving CmdStan, RStan, and PyStan as separate repos. But I could be convinced that only CmdStan should be there (against what I think is natural) or that all three interfaces live together. It’d be great to have comprehensive tests across the interfaces, but we’ve never gotten that going.

one more point to consider (sorry if already raised) is the reason why we split stan-math apart from stan: As I understood a key advantage of this setup was that the test-burden was much decreased for anything upstream from stan-math.

In practice this means that we do not have to run distributions tests whenever we change things in the language.

Can we still have this convenience with a mono-repo? If the answer is no, then this is a hefty price we are paying here.

We will be able to configure testing separately from git history, so we can avoid testing distributions every time we change anything.

I still think that CmdStan:

  • is fairly simple, something similar to an “example” or bare bones interface representing the minimal complete Stan implementation
  • from a people perspective is grouped with Stan and Math
  • is required for end-to-end testing

and so we should include it with the other two repos.

1 Like

That’s enough of a justification for me to be behind it. I don’t think anyone is trying trying to make CmdStan more than that, so we’re safe for now.

CmdStan is different than RStan and PyStan in many relevant ways:

  • BSD license
  • pure C++
  • zero dependencies outside of those in stan and math libs
  • [from @seantalts] same developers as stan and math (for now anyway)

What’s the advantage of not including CmdStan in the monorepo? I like that it gives you an end-to-end functioning package.

That was a motivation. But we were wrong. This has hugely increased the test burden now that we do upstream tests which we can’t synchronize.

Right, but we do run the other way around, which is much more common.

Yes, but I think we may just go to more combined testing because it’s too hard to keep all the dependencies in place otherwise.

1 Like

Heads up - I’m working with a contractor who is ready to begin this work. I think it might be a reasonable idea to freeze merges to develop(s) for a week while he finishes? Is that feasible? @Bob_Carpenter

3 Likes

Freezing merges should be ok. I think a week would be fine, but @Bob_Carpenter can reply.

If it doesn’t go as planned, we can always merge into wherever and then reapply the merges afterwards, right? Hopefully we won’t have too many that would need to be reapplied.

Yeah, it should be doable and just a few one-offs if we mess up, we’re not super high traffic at the moment.

Cool. And I’m definitely not opposed to a freeze for a week.

1 Like

found this today:

(starting from this: https://stackoverflow.com/questions/33569189/convert-git-repo-with-submodules-to-single-repo)

1 Like

Yes, that should be OK. What happens to the PRs in progress now?

Thanks—that looks like the kind of thing we need.