Moving stan2tfp out of stanc3's repo

We discussed this during the last monthly language meeting

In short, stan2tfp has not really been updated at the same speed as Stan has. Most of the time, when a new feature is added to Stan it is simply ignored in stan2tfp and replaced with a not-yet-implemented error. This mainly serves as an additional barrier to adding to stanc3’s main C++ backend.

Thanks to the rather expressive language dune provides to libraries, it is very easy to split out stan2tfp into a seperate repo without any additional code re-use. I’ve done that here:

This uses GitHub - WardBrian/stanc3 at extract-tfp as a git submodule. As seen here besides deleting the stan2tfp, the only required changes are 4 one-liners to the dune configs.

This would make the relationship between stan2tfp and stanc3 more like that of cmdstan/stan/stan-math - stanc3 would be able to move along while stan2tfp stays pinned on it’s current version until someone intentionally updates it.

How does this sound? Specifically want input from @seantalts and @Adam_Haber, but regular contributors to stanc3 (@stevebronder, @rok_cesnovar, @nhuurre) would also be good to hear from.

If we want to do this, it would be very easy to transfer ownership of the above repo, add a bit more to the README, and call it done. I think @serban-nicusor would need to do some cleanup/moving around of jenkinsfiles and github actions to get CI working on the new repo, but probably nothing too crazy.

This also would serve as a great template for how to use the stanc3 abilities from outside the main stanc3 repo, for anyone looking to write (for example) a language server or separate back end.

3 Likes

I agree with the reasoning and the proposed plan.

1 Like

I think we want to move in the opposite direction and integrate the repos you mentioned. If there are tests that we aren’t using anymore and are becoming burdensome, we should not run the tests anymore or even delete them. We shouldn’t conflate source control, which serves to version the universe, with testing or deployment. These are separate tools.

Could you elaborate on this a bit more? The goal of splitting them is to prevent stan2tfp from being a barrier to further development of the C++ backend without also filling stan2tfp with failwith "TODO" statements. The additional tests aren’t really the concern, more that compiling stanc3 currently also compiles this separate backend which is at a very different level of completeness and support. This would allow stanc3 development to continue at its own pace with the maintenance of stan2tfp being a separate and discretized task.

I think we can achieve that by choosing to separate the backend into another (dune) executable within the same repo, for example. We can choose not to build that by default any more. Versioning is an orthogonal concern from what is built or tested.

Could you link to an example of the failwith issue? I’m not sure why adding to the C++ math library would cause the need to update stan2tfp.

That’s fair, and I’m reasonably certain that would be possible. My argument toward seperating the versioning would be in that approach, we would almost certainly (and, I think, rather quickly) end up with a version of stan2tfp in master which does not compile, because the files in frontend/middle/etc have moved on.

The submodule approach is nice because, while the submodule may be out of date relative to master on stanc3, it would always be the case that the separate repo would build, even if it is for a prior version of Stan.

It’s not the stan math library that is the concern, rather any time the MIR is updated.

There’s a bunch that already doesn’t get supported, e.g:
one
two
three

and any time a PR like this one is made it has to update the TFP files to keep the repo in a compilable state

Some features, like Complex numbers, never got explicit support, so they most likely just do completely the wrong thing (e.g this line I think will just turn any expression like 4+3i into a float)

Edit: Edited so the above links don’t expand into huge github previews

I tend to agree; I’m not sure about stan2tfp being a barrier, but it’s pretty clear that it’s (a) not being used and (b) not being developed/maintained anymore, in which case keeping it within the main stanc3 repo doesn’t make much sense, to me at least.

I think this could potentially be good for someone who wanted to develop it further as well. By moving it out you sort of treat stanc3 more like a black box and it makes it much clearer what things are within the purview of the backend, so someone interested in working on it could do so without the additional overhead of the rest of stanc3 around.

I saw on the github forks of stanc3 that a few people have been working on a Pyro/Numpyro backend, and I think they could similarly benefit from using this style of encapsulation. At the moment it’s impossible without the dune config changes to make things like the ‘middle’ library possible to use from outside the project

I’ve just updated GitHub - WardBrian/stan2tfp using git-filter-repo to preserve the history of the commits on the two relevant folders. It now has ~200 commits so it actually shows history (and things like git blame work), otherwise the result is the same.

If we just turn off the tests that fail and don’t compile stan2tfp by default then eventually we’ll just have a stan2tfp that doesn’t really compile, or am I not thinking right about how that would go? I’m not normally pro - “breaking up things into separate repos” (I find the stan <- stan_math structure annoying) but here it seems like a good solution since we don’t want to pay the technical debt of stan2tfp. @WardBrian does this also break up stanc into separate public libraries? That would be great as well since that would be a big step to getting a doc website up for stanc.

I vote yes, seems good!

It just declares each module we already have as public. It does therefore allow you to build with odoc— setting that up and making some better doc comments was going to be my next repo hygiene project actually!

1 Like

@gbaudart pointed out to me that is is possible to use opam to vendor stanc3 rather than using a git submodule. You would essentially install stanc as a opam library using opam pin stanc path/to/repo/here.git#optional-branch, and after that the rest is identical.

This is indeed possible, it is just a question of preference between this and a git submodule. I think I prefer the submodule approach because it lets you target specific commits, but either could work just as well.

Hi,
To be fair, you can also pin a particular commit with:

opam pin stanc  path/to/repo/here.git#commit-sha

But unfortunately, you cannot easily declare such a dependency in the opam file (without explicitly tagging a version).

Using a submodule is a perfectly reasonable solution.

1 Like

I definitely support opening up stanc3 so that others can build on top of it in such a way. I think including at least one other backend in the git repo and build is a good idea so that we can use it to remind ourselves that that is one of the design goals. It can be easy to forget why we have a MIR and hard to tell at what level of IR certain constructs belong, at least without some concrete examples. It didn’t seem like the updates to stan2tfp were particularly burdensome to me, at least weighed against the benefits of a ready entry point for further tfp development and a working example of an alternative backend. Seems like I might be in the minority here but that’s my $0.02.

1 Like

at least one other backend in the git repo and build is a good idea so that we can use it to remind ourselves that that is one of the design goals

I agree with this, I am just not convinced stan2tfp is serving this goal. I think having a second backend in the repo that is being consistently ignored and is very incomplete sends the opposite message. If we reached a point where stan2tfp (or any other backend) could compile the majority of the tests we’re running stanc3 against, I think it would be great to officially support it by moving it into the same repo and requiring feature parity in new PRs.

I think it being in a separate repo still serves as an entry point for further development and actually is a better working example than if it lives in stanc3. If someone wants to build a new backend for Stan, I think “it should live in the stanc3 repo from birth” is the wrong idea to be communicating. Showing how it is possible to use the stanc3 resources externally is a super valuable example, IMO.

You do raise a good point on the MIR, etc, but I’m not convinced stan2tfp being in the main repo has prevented us making rather C+±based choices there in the past. I think this is ultimately a code review issue, not a repo structure one

2 Likes