OCaml & symbolic differentiation?


#1

Hi!

I was wondering how easy it is to perform symbolic differentiation of simple algebraic expressions?

This would be a really cool thing to have with stanc3 and ODEs. One can easily find hits on the net for this:

So it seems possible to do.

The intent here is that one could potentially generate the Jacobian of an ODE RHS function (which is usually really simple algebraic stuff) and then auto-generate the necessary source code for the ODE integrators. This will give ~3x speedups for ODE integration problems (for cases I tested myself).

Best,
Sebastian


#2

Excellent idea. Taking your idea a step further:
I wonder, if the parser would be written with the support of an algebra package, eg.
Sage http://doc.sagemath.org/html/en/tutorial/index.html? This would allow doing
symbolic differentiation and using other nice math functions.


#3

We should probably attempt to make this possible. If not included in the vanilla Stan, then the parser can hopefully be written modular enough to make it happen. As far as I understand the goal is to parse Stan programs into an intermediate thing before going to C++. Maybe some clever extensions can deal with the intermediate thing and apply stuff from Sage, Maxima or Mathematica.

The benefit of going with OCaml would be that it is the same language as the stanc3 parser and as such does not add a dependency… and for ODEs we are really talking about darn simple expression in most cases such that a reduced symbolic differentiation functionality is totally fine (and the benefit is really large).


#4

Easy. We’ll be able to do that in the OCaml compiler. It will require coding it all for the cases we care about, but that’s relatively easy, at least locally.


#5

How do we get there? I mean should we prepare for this already now in some way? Having analytic Jacobians for the ODE stuff will speed it up tremendously (~3x in my examples). Hence, I would love to put this on the roadmap and ideally on the shorter term roadmap (shorter relative to the OCaml transform).

@Matthijs any comments or thoughts on this? Should we discuss?

ODE expressions are usually very simple algebraic expressions and maybe a few simple function (exp, log, sin, cos… basics).


#6

I’ve been thinking about this and advocating it for the past couple of months and it’s probably a good idea to formulate the plan on a roadmap (soon to come!). I’ll try to take this opportunity to write briefly and informally what I think might be a good starting point for a roadmap w.r.t. the new Stan compiler and what it enables. [x] indicates stuff that’s already been done (often by @Matthijs!), [ ] is stuff to do that no one is working on, and [XX%] is in progress by someone roughly XX% done. Skip to the last section to focus on the work on symbolic derivatives for Stan functions that can start right away!

Stanc3 vague draft roadmap

Phase 1: stanc3 replicates all Stan 2 functionality

[x] Parse all valid Stan 2 models
[x] Reject all invalid Stan 2 models
[x] Output decent error messages that seem at least as good as those from Stan 2
[x] Define the final Abstract Syntax Tree and an s-expression representation of it as a third-party hook
[95%] Define the Middle Intermediate Representation (MIR) on which we will run optimization passes and output C++ code
[ ] Use static & data flow analysis to add dependency information to the MIR indicating which statements impact other, later statements
[ ] Construct optimization passes like Common Sub-expression Elimination, Loop-invariant Code Motion, peephole algebraic optimizations (log(1-x) -> log1m(x), etc).
[ ] Use @mgorinova’s work to analyze the proper blocks for Stan statements to be in for maximum performance
[45%] Emit a Stan model in C++ that uses the Stan Math library (Sean, Bob)
[ ] Emit Tensorflow Probability or PyTorch Python code from the MIR
[ ] Use symbolic differentiation on Stan functions if possible to define analytic gradients automatically.
[ ] Move as much of the Math library into Stan code as possible, where leveraging the symbolic derivatives is equally performant and where we can do whole-program optimization on the resulting Stan program + Stan Math library MIR

Phase 2: Compatible Stan Language additions

Nothing here is done yet, but we have some goals:

  • Allow users to more easily define and share libraries of Stan functions and models (model fragments? structs?)
  • Make it easier to deal with incoming categorical data (dictionaries?)
  • Add user-defined gradients, at least for first-order derivatives
  • Add mechanism for annotating Stan code (e.g. for grouping parameters, sending things to the GPU, outputting variables or suppressing their output, function purity, broadcasting, etc).

Phase 3: Backwards incompatibility

We’ll need to remove certain things that are currently valid Stan to aid in the development of future features. This will include all things currently marked “deprecated” by the latest Stan 2 compiler, but otherwise this list is pretty up in the air. The only certainty seems to be that we need to change where the array size goes when declaring an array. I’d also like to consider removing arrays of doubles from the language. (@Bob_Carpenter what else goes here? I’m blanking)

Adding symbolic differentiation

To add this, we’ll need to access a symbolic differentiation library from OCaml (so that it can operate on the MIR). One likely candidate seems to be the C++ backend that drives SymPy, SymEngine. There is a half-done ocaml backend that tries to work with it here that we could try to leverage. I am looking for recommendations of other packages that we might be able to distribute with our code and use from OCaml - do you @wds15 or @andre.pfeuffer know if Sage, Maxima, Mathematica, etc would be usable in that way? Either way, the steps here can begin right away! Here is what I think the steps look like, super roughly:

  1. Get the symbolic differentiation engine linking with OCaml in our build process.
  2. Write the adapter necessary to convert between MIR and whatever input format the differentiation library uses, and back again.
  3. Add something to MIR such that we can represent the jacobian code necessary, and figure out how to represent the conglomerate of the original function with its jacobian. I think we likely want to end up constructing a single function that returns both a value and its gradient, but we might need to adjust the MIR to make that possible (e.g. add tuple return types, or a special global var a la operands_and_partials API).
  4. See how far that gets us! We’ll also need robustness measures / heuristics to back off from a symbolic derivative if it seems like it will be slower than what we’d get through autodiff, or if deriving the symbolic derivative would take too long.

#7

Sounds like a good direction … but :D … what my benchmarks with the ODE system showed me is that providing the ODE RHS with analytic gradients via operands_and_partials did not speed up anything. The reason is that our AD tree is costly to build and tear down inside an ODE integrator even with analytic gradients. You really need to generate dedicated code for the ODE RHS and it’s Jacobian to get that faster (a lot faster).

So at least if we target ODE solving we need to also take this into consideration as well.

I can have a look at available packages… Mathematica would be extremely powerful, but it cost a ton of $$$.


#8

I am pretty sure SymEngine is the best candidate. Sage, Maxima, and Mathematica are all huge, although possibly you could make API calls to WolframAlpha. GiNaC is a possibility but GPL.


#9

Can you point me to what this looks like? Are you saying we could symbolically generate this, just not with the operands_and_partials API? What would an API for that look like? Sorry, I’m pretty much totally unfamiliar with the innards of the ODE solver atm but want to learn.


#10

No reason to be sorry about not knowing the innards about the ODE system.

A somewhat advanced start to dive into this could be the ongoing PR to speed things up: https://github.com/stan-dev/math/pull/1066

This PR shows nicely how avoiding frequent re-allocation of the parameters can already speed things up.

The complication of the ODE system is that we need to solve in addition to the user supplied ODE the sensitivity ODE RHS which derives from the base ODE. What does help me to remind myself how this looks like, is found here: https://github.com/stan-dev/math/files/2659821/ode_system.pdf .

I have written a small R extension for code generating the ODE sensitivities here: https://github.com/stan-dev/rstan/tree/rstanode-proto/rstanode That works nicely and gives me nice speedups. With this code I found out with earlier versions of it that providing the ODE RHS Jacobians using operands_and_partials is not useful.


#11

Sage/Maxima is GPL and Mathematica is commercial. My first thoughts were about to write a Mathematica script, to do the differentiation, eg. D[f, x] then apply Simplify or FullSimplify and finally use http://library.wolfram.com/infocenter/MathSource/810/ to avoid duplicate calculations.
Stans OCAML compiler just needs to provide the necessary C++ interfacing.
I assume, that the critical path calculations can be outsourced into functions.
The connections between the code can be handled by Stans autodiff, optionally with help from SymEngine.
For maintainability I prefer loose coupled systems.


#12

You’ll find that symbolic differentiation is going to be tangled up heavily with optimizations in Stan. If we’re going to try to link to something external, then

  • using a GPL-ed package would have a huge impact on our aggregate license, so I’d rather stay away from it.

  • we should focus on something we can extend, because they probably won’t have the derivatives we need built in unless we expand all of our functions


#13

Same! I think we’re leaning more towards that lately, too.

Just to summarize a little more, I think if you put together what @andre.pfeuffer and @Bob_Carpenter said you end up wanting to find a system we can add without ruining our license that, crucially, has the hooks we need to add our own arbitrary derivatives, because we don’t want to start maintaining a fork of a symbolic algebra library. Sounds like GPL isn’t going to work for us so Sage/Maxima is out, while Mathematica is commercial so they’re out. SymEngine is MIT and BSD, is that good for us @Bob_Carpenter? Is that literally the only choice left? If so, I hope it has hooks for custom derivatives (anyone know? @wds15 or @bgoodri?).


#14

Not a developer, just love Stan! Hopefully this helps: According to Wikipedia you have 5 BSD licensed computer algebra systems.

Axiom Richard Jenks 1977 1993 and 2002[7] August 2014[8] Free modified BSD license General purpose CAS. Continuous Release using Docker Containers
FriCAS Waldek Hebisch 2007 2007 1.3.5 3 February 2019 Free modified BSD license Full-featured general purpose CAS. Especially strong at symbolic integration.
OpenAxiom Gabriel Dos Reis 2007 2007 1.4.2 2013 Free modified BSD license General purpose CAS. A fork of Axiom.
Reduce Anthony C. Hearn 1960s 1968 2018 Free modified BSD license Historically important general purpose CAS. Still alive, as open-sourced and freed in December 2008
SymPy Ondřej Čertík 2006 2007 1.3 14 September 2018 Free modified BSD license Python-based

#15

That’s a really interesting list. Here is a short history and comparison of this class of software: https://youtu.be/M742_73edLA?t=745


#16

From the math library perspective (and this is coming from a Mathematica guy that loves symbolics…):
I always thought it would be cool to use Eigen expression templates to build the derivative rules. Eigen expressions naturally change their data type as you build up the expression. Something like a+b*c has an internal scalar type (ignoring whether these are matrices, vectors, etc) similar to:
internal::scalar_sum_op<double,internal::scalar_product_op<double,double>::result_type>
In other words, the resulting expression type contains enough information to write a metaprogram to descend the tree and write out the appropriate reverse mode rules as a function of the leaf nodes only, skipping the generation of all the temporaries.
Writing it this way could drastically reduce the size of the differentiation stack, and make it possible to write and maintain less autodiff code in general.
Lastly, there is nothing saying you couldn’t also enable it for scalars by stuffing a 1x1 matrix inside some other non-eigen type.