OCaml & symbolic differentiation?

Hi!

I was wondering how easy it is to perform symbolic differentiation of simple algebraic expressions?

This would be a really cool thing to have with stanc3 and ODEs. One can easily find hits on the net for this:

So it seems possible to do.

The intent here is that one could potentially generate the Jacobian of an ODE RHS function (which is usually really simple algebraic stuff) and then auto-generate the necessary source code for the ODE integrators. This will give ~3x speedups for ODE integration problems (for cases I tested myself).

Best,
Sebastian

1 Like

Excellent idea. Taking your idea a step further:
I wonder, if the parser would be written with the support of an algebra package, eg.
Sage Tutorial? This would allow doing
symbolic differentiation and using other nice math functions.

We should probably attempt to make this possible. If not included in the vanilla Stan, then the parser can hopefully be written modular enough to make it happen. As far as I understand the goal is to parse Stan programs into an intermediate thing before going to C++. Maybe some clever extensions can deal with the intermediate thing and apply stuff from Sage, Maxima or Mathematica.

The benefit of going with OCaml would be that it is the same language as the stanc3 parser and as such does not add a dependency… and for ODEs we are really talking about darn simple expression in most cases such that a reduced symbolic differentiation functionality is totally fine (and the benefit is really large).

1 Like

Easy. We’ll be able to do that in the OCaml compiler. It will require coding it all for the cases we care about, but that’s relatively easy, at least locally.

2 Likes

How do we get there? I mean should we prepare for this already now in some way? Having analytic Jacobians for the ODE stuff will speed it up tremendously (~3x in my examples). Hence, I would love to put this on the roadmap and ideally on the shorter term roadmap (shorter relative to the OCaml transform).

@Matthijs any comments or thoughts on this? Should we discuss?

ODE expressions are usually very simple algebraic expressions and maybe a few simple function (exp, log, sin, cos… basics).

I’ve been thinking about this and advocating it for the past couple of months and it’s probably a good idea to formulate the plan on a roadmap (soon to come!). I’ll try to take this opportunity to write briefly and informally what I think might be a good starting point for a roadmap w.r.t. the new Stan compiler and what it enables. [x] indicates stuff that’s already been done (often by @Matthijs!), [ ] is stuff to do that no one is working on, and [XX%] is in progress by someone roughly XX% done. Skip to the last section to focus on the work on symbolic derivatives for Stan functions that can start right away!

Stanc3 vague draft roadmap

Phase 1: stanc3 replicates all Stan 2 functionality

[x] Parse all valid Stan 2 models
[x] Reject all invalid Stan 2 models
[x] Output decent error messages that seem at least as good as those from Stan 2
[x] Define the final Abstract Syntax Tree and an s-expression representation of it as a third-party hook
[95%] Define the Middle Intermediate Representation (MIR) on which we will run optimization passes and output C++ code
[ ] Use static & data flow analysis to add dependency information to the MIR indicating which statements impact other, later statements
[ ] Construct optimization passes like Common Sub-expression Elimination, Loop-invariant Code Motion, peephole algebraic optimizations (log(1-x) -> log1m(x), etc).
[ ] Use @mgorinova’s work to analyze the proper blocks for Stan statements to be in for maximum performance
[45%] Emit a Stan model in C++ that uses the Stan Math library (Sean, Bob)
[ ] Emit Tensorflow Probability or PyTorch Python code from the MIR
[ ] Use symbolic differentiation on Stan functions if possible to define analytic gradients automatically.
[ ] Move as much of the Math library into Stan code as possible, where leveraging the symbolic derivatives is equally performant and where we can do whole-program optimization on the resulting Stan program + Stan Math library MIR

Phase 2: Compatible Stan Language additions

Nothing here is done yet, but we have some goals:

  • Allow users to more easily define and share libraries of Stan functions and models (model fragments? structs?)
  • Make it easier to deal with incoming categorical data (dictionaries?)
  • Add user-defined gradients, at least for first-order derivatives
  • Add mechanism for annotating Stan code (e.g. for grouping parameters, sending things to the GPU, outputting variables or suppressing their output, function purity, broadcasting, etc).

Phase 3: Backwards incompatibility

We’ll need to remove certain things that are currently valid Stan to aid in the development of future features. This will include all things currently marked “deprecated” by the latest Stan 2 compiler, but otherwise this list is pretty up in the air. The only certainty seems to be that we need to change where the array size goes when declaring an array. I’d also like to consider removing arrays of doubles from the language. (@Bob_Carpenter what else goes here? I’m blanking)

Adding symbolic differentiation

To add this, we’ll need to access a symbolic differentiation library from OCaml (so that it can operate on the MIR). One likely candidate seems to be the C++ backend that drives SymPy, SymEngine. There is a half-done ocaml backend that tries to work with it here that we could try to leverage. I am looking for recommendations of other packages that we might be able to distribute with our code and use from OCaml - do you @wds15 or @andre.pfeuffer know if Sage, Maxima, Mathematica, etc would be usable in that way? Either way, the steps here can begin right away! Here is what I think the steps look like, super roughly:

  1. Get the symbolic differentiation engine linking with OCaml in our build process.
  2. Write the adapter necessary to convert between MIR and whatever input format the differentiation library uses, and back again.
  3. Add something to MIR such that we can represent the jacobian code necessary, and figure out how to represent the conglomerate of the original function with its jacobian. I think we likely want to end up constructing a single function that returns both a value and its gradient, but we might need to adjust the MIR to make that possible (e.g. add tuple return types, or a special global var a la operands_and_partials API).
  4. See how far that gets us! We’ll also need robustness measures / heuristics to back off from a symbolic derivative if it seems like it will be slower than what we’d get through autodiff, or if deriving the symbolic derivative would take too long.
2 Likes

Sounds like a good direction … but :D … what my benchmarks with the ODE system showed me is that providing the ODE RHS with analytic gradients via operands_and_partials did not speed up anything. The reason is that our AD tree is costly to build and tear down inside an ODE integrator even with analytic gradients. You really need to generate dedicated code for the ODE RHS and it’s Jacobian to get that faster (a lot faster).

So at least if we target ODE solving we need to also take this into consideration as well.

I can have a look at available packages… Mathematica would be extremely powerful, but it cost a ton of $$$.

I am pretty sure SymEngine is the best candidate. Sage, Maxima, and Mathematica are all huge, although possibly you could make API calls to WolframAlpha. GiNaC is a possibility but GPL.

1 Like

Can you point me to what this looks like? Are you saying we could symbolically generate this, just not with the operands_and_partials API? What would an API for that look like? Sorry, I’m pretty much totally unfamiliar with the innards of the ODE solver atm but want to learn.

No reason to be sorry about not knowing the innards about the ODE system.

A somewhat advanced start to dive into this could be the ongoing PR to speed things up: https://github.com/stan-dev/math/pull/1066

This PR shows nicely how avoiding frequent re-allocation of the parameters can already speed things up.

The complication of the ODE system is that we need to solve in addition to the user supplied ODE the sensitivity ODE RHS which derives from the base ODE. What does help me to remind myself how this looks like, is found here: https://github.com/stan-dev/math/files/2659821/ode_system.pdf .

I have written a small R extension for code generating the ODE sensitivities here: https://github.com/stan-dev/rstan/tree/rstanode-proto/rstanode That works nicely and gives me nice speedups. With this code I found out with earlier versions of it that providing the ODE RHS Jacobians using operands_and_partials is not useful.

2 Likes

Sage/Maxima is GPL and Mathematica is commercial. My first thoughts were about to write a Mathematica script, to do the differentiation, eg. D[f, x] then apply Simplify or FullSimplify and finally use http://library.wolfram.com/infocenter/MathSource/810/ to avoid duplicate calculations.
Stans OCAML compiler just needs to provide the necessary C++ interfacing.
I assume, that the critical path calculations can be outsourced into functions.
The connections between the code can be handled by Stans autodiff, optionally with help from SymEngine.
For maintainability I prefer loose coupled systems.

You’ll find that symbolic differentiation is going to be tangled up heavily with optimizations in Stan. If we’re going to try to link to something external, then

  • using a GPL-ed package would have a huge impact on our aggregate license, so I’d rather stay away from it.

  • we should focus on something we can extend, because they probably won’t have the derivatives we need built in unless we expand all of our functions

1 Like

Same! I think we’re leaning more towards that lately, too.

Just to summarize a little more, I think if you put together what @andre.pfeuffer and @Bob_Carpenter said you end up wanting to find a system we can add without ruining our license that, crucially, has the hooks we need to add our own arbitrary derivatives, because we don’t want to start maintaining a fork of a symbolic algebra library. Sounds like GPL isn’t going to work for us so Sage/Maxima is out, while Mathematica is commercial so they’re out. SymEngine is MIT and BSD, is that good for us @Bob_Carpenter? Is that literally the only choice left? If so, I hope it has hooks for custom derivatives (anyone know? @wds15 or @bgoodri?).

Not a developer, just love Stan! Hopefully this helps: According to Wikipedia you have 5 BSD licensed computer algebra systems.

Axiom Richard Jenks 1977 1993 and 2002[7] August 2014[8] Free modified BSD license General purpose CAS. Continuous Release using Docker Containers
FriCAS Waldek Hebisch 2007 2007 1.3.5 3 February 2019 Free modified BSD license Full-featured general purpose CAS. Especially strong at symbolic integration.
OpenAxiom Gabriel Dos Reis 2007 2007 1.4.2 2013 Free modified BSD license General purpose CAS. A fork of Axiom.
Reduce Anthony C. Hearn 1960s 1968 2018 Free modified BSD license Historically important general purpose CAS. Still alive, as open-sourced and freed in December 2008
SymPy Ondřej Čertík 2006 2007 1.3 14 September 2018 Free modified BSD license Python-based
1 Like

That’s a really interesting list. Here is a short history and comparison of this class of software: https://youtu.be/M742_73edLA?t=745

1 Like

That’s the basis of the Adept autodiff system. It uses expression template reduction to unfold derivative expressions analytically and does just what @ChrisChiasson is suggesting in reducing the size of the expression graph (the autodiff stack in the implementation).

It’s a lot better than naive autodiff. But it’s not going to lead to things like good sharing between function and derivative evaluation or the kind of laziness we can get with custom analytic derivatives.

1 Like

We can either stick to BSD compatible licenses without copyleft to keep our lives simple or figure out some kind of plugin architecture.

1 Like

Given that we have a really nice language and framework for this stuff in OCaml now, I’d probably prefer that. We already have pretty long compile times without exploiting the Turing-complete nature of C++ templates. But it is a cool idea and might make for an interesting project.

It would be nice to try to create a mini-implementation that required Eigen and used only these techniques (common sub-expression elimination after elimination of unused expression nodes). It makes me wonder if the approach is general enough to do both forward and reverse mode with the same code, and it makes me wonder if all data-type specific operations could be eliminated.