I’ve been thinking about this and advocating it for the past couple of months and it’s probably a good idea to formulate the plan on a roadmap (soon to come!). I’ll try to take this opportunity to write briefly and informally what I think might be a good starting point for a roadmap w.r.t. the new Stan compiler and what it enables. [x] indicates stuff that’s already been done (often by @Matthijs!), [ ] is stuff to do that no one is working on, and [XX%] is in progress by someone roughly XX% done. Skip to the last section to focus on the work on symbolic derivatives for Stan functions that can start right away!
Stanc3 vague draft roadmap
Phase 1: stanc3 replicates all Stan 2 functionality
[x] Parse all valid Stan 2 models
[x] Reject all invalid Stan 2 models
[x] Output decent error messages that seem at least as good as those from Stan 2
[x] Define the final Abstract Syntax Tree and an s-expression representation of it as a third-party hook
[95%] Define the Middle Intermediate Representation (MIR) on which we will run optimization passes and output C++ code
[ ] Use static & data flow analysis to add dependency information to the MIR indicating which statements impact other, later statements
[ ] Construct optimization passes like Common Sub-expression Elimination, Loop-invariant Code Motion, peephole algebraic optimizations (
log(1-x) -> log1m(x), etc).
[ ] Use @mgorinova’s work to analyze the proper blocks for Stan statements to be in for maximum performance
[45%] Emit a Stan model in C++ that uses the Stan Math library (Sean, Bob)
[ ] Emit Tensorflow Probability or PyTorch Python code from the MIR
[ ] Use symbolic differentiation on Stan functions if possible to define analytic gradients automatically.
[ ] Move as much of the Math library into Stan code as possible, where leveraging the symbolic derivatives is equally performant and where we can do whole-program optimization on the resulting Stan program + Stan Math library MIR
Phase 2: Compatible Stan Language additions
Nothing here is done yet, but we have some goals:
- Allow users to more easily define and share libraries of Stan functions and models (model fragments? structs?)
- Make it easier to deal with incoming categorical data (dictionaries?)
- Add user-defined gradients, at least for first-order derivatives
- Add mechanism for annotating Stan code (e.g. for grouping parameters, sending things to the GPU, outputting variables or suppressing their output, function purity, broadcasting, etc).
Phase 3: Backwards incompatibility
We’ll need to remove certain things that are currently valid Stan to aid in the development of future features. This will include all things currently marked “deprecated” by the latest Stan 2 compiler, but otherwise this list is pretty up in the air. The only certainty seems to be that we need to change where the array size goes when declaring an array. I’d also like to consider removing arrays of doubles from the language. (@Bob_Carpenter what else goes here? I’m blanking)
Adding symbolic differentiation
To add this, we’ll need to access a symbolic differentiation library from OCaml (so that it can operate on the MIR). One likely candidate seems to be the C++ backend that drives SymPy, SymEngine. There is a half-done ocaml backend that tries to work with it here that we could try to leverage. I am looking for recommendations of other packages that we might be able to distribute with our code and use from OCaml - do you @wds15 or @andre.pfeuffer know if Sage, Maxima, Mathematica, etc would be usable in that way? Either way, the steps here can begin right away! Here is what I think the steps look like, super roughly:
- Get the symbolic differentiation engine linking with OCaml in our build process.
- Write the adapter necessary to convert between MIR and whatever input format the differentiation library uses, and back again.
- Add something to MIR such that we can represent the jacobian code necessary, and figure out how to represent the conglomerate of the original function with its jacobian. I think we likely want to end up constructing a single function that returns both a value and its gradient, but we might need to adjust the MIR to make that possible (e.g. add tuple return types, or a special global var a la
- See how far that gets us! We’ll also need robustness measures / heuristics to back off from a symbolic derivative if it seems like it will be slower than what we’d get through autodiff, or if deriving the symbolic derivative would take too long.