The over-arching goal is the standard goal in all of software engineering: modularity. The reason we want modularity is to reduce the complexity of what people have to deal with in terms of coding, design and doc by enforcing clean boundaries.
I can believe submodules aren't the only way to do this.
Working back from our user-facing distribution goals, we need separate releases for all of:
Stan math (includes Boost and Eigen)
Stan language (includes math)
RStan (includes services
CmdStan (includes services)
... other interfaces ...
The other logical modules we have are the following.
The interfaces, in their role as clients of Stan infrastructure, need to call the services layer as well as the language compiler.
From a developer perspective, I want to be able to work on just one of these bulleted items and test it independently. The problems we wind up having is when we need to make synchronized changes to stan-dev/math and stan-dev/stan or to stan-dev/stan and stan-dev/rstan.
I looked at the linked article Sean sent, and it left me curious as to just what the scope of these monorepos are they talked about. I have a hard time believing that all of Google's code base is in one repo, unless they mean something else by the word. So where do they decide to break in terms of project scale?
I take the point about having to duplicate code across repos. It's terrible having all that makefile and Jenkins config duplicated (though I don't know how easy it would be to have a master make for all of Stan).