Stan language: do we patch existing C++ transpiler or do we wait for the OCaml version?

maintenance

#1

From @Bob_Carpenter in this PR stan-dev/stan#2371:

Are we going to keep patching the C++ transpiler or just wait for the OCaml version for all these fixes? There’s a pile of these things that Matthijs has found already.

I’m curious too. What’s the relevant info we need to make a reasonable decision? Some things that come to mind:

  • What’s the status of the OCaml version? (where are we at now, how long do we think it’ll actually be ready?)
  • Do we want to release a new Stan version with the existing C++ transpiler?

#2

This depends on when we think the OCaml version will be released and that in turn depends on the feature set for the first release and how much of the already accumulated technical debt we try to pay down.


#3

I’m going to say it’s maybe half way done. Completion involves parsing the Stan language into the AST, translating the AST into the Middle Intermediate Representation (MIR), and generating code for that. On top of that we’ll need to allocate some time for figuring out distribution for RStan/CRAN, and do a bit of work for PyStan. @Matthijs has the entire language parsing (with 5-8 bugs fixed, something like that) into an AST that is being finalized today. The translation to MIR is probably 95% done, and the code generation is something like 40% done.

I think it might be ready in ~3 months or so.

I think we’ll probably end up releasing another version with the existing C++ compiler for the GPU and threading stuff if nothing else.

I don’t, however, think it’s worth spending Columbia or NumFOCUS money fixing the bugs that we’ve only discovered through the extensive test suite we’re (well, mostly @Matthijs) developing for the new compiler, unless a user chimes in that they have experienced this.


#4

If that’s what we’re going to do, can we:

  1. continue to file issues
  2. mark certain bugs as “won’t fix” (or some other label that makes it clear)?

I’m indifferent about fixing the bugs that our users aren’t catching. It’d make sense that we patch bugs that our users are running into. (I’m going to assume we’ll slip a little from the time estimate and 3 months is really a lower bound)


#5

Agreed. I guess that makes sense for Stan 2 repo issues; you could think about them as applying just to the Stan 2 compiler codebase. Might want to use the milestone or something to indicate that we will be releasing code that fixes them soon.