Autodiff expression graph optimizations

seantalts · March 8, 2017, 7:02pm

Hey, Bob and I have been occasionally talking about potential compiler/query-style optimizations for the autodiff expression, things like common subexpression elimination, node fusion, and a variety of peephole optimizations where we can recognize a pattern and replace it with something more efficient.

Tensorflow recently released an experimental library called XLA that does some of this kind of stuff, plus JITs to CPU or GPU. I haven’t looked into it that much but I think it’s solving an extremely similar problem to the Stan math library and has many more resources, so it might be interesting to learn about what they’re doing and implement some of their ideas.

https://www.tensorflow.org/versions/master/experimental/xla/

seantalts · March 8, 2017, 7:03pm

Here’s Bob’s comments on this stuff:

I see two angles there. The JIT stuff and the peephole
optimization for folding compound operations. Certainly
we can do the latter, which will help us optimize more
naively written programs (not very good for bragging about
model speed, but probably good for users). Some of the
JIT advantages we get from C++ static analysis, but most
we won’t get or couldn’t use like the GPU stuff (at least
until we find someone to plumb in the Eigen GPU stuff
under our matrix ops).

If you read the Adept autodiff paper, you can see how
they use template metaprogramming to do something similar,
but one level of abstraction lower (it’s more like it
runs reverse mode statically over a local graph).

There’s a third angle, which is analyzing the entire
graph of operations for things like parallelism or sparsity.
I assume TensorFlow does a lot of the former, and there’s
a very large autodiff literature on the latter. Both can
be very intractable operations to solve exactly, but
presumably there are useful heuristics like for other
optimizations.

Do you have any idea how TensorFlow does autodiff? I’d
love to benchmark what they have versus Stan and also
compare functionality.

bgoodri · March 8, 2017, 7:35pm

There is a former QMSS student who is interested in doing this and took an actual class on GPU development.

seantalts · March 8, 2017, 8:35pm

Point him to discourse or my email, would be happy to help him figure out how to get started.

Erik_Strumbelj · April 18, 2018, 2:28pm

@seantalts @bgoodri Has there been any progress on this?

@rok_cesnovar and I have also now started discussing how to optimize/parallelize autodiff. In particular, in combination with the GPU.

Topic		Replies	Views
Neat autodiff C++ package Enoki Developers math	3	1149	September 9, 2019
A new continuation-based autodiff by refactoring Developers	24	5080	March 29, 2019
Stanc3 optimization and analyses walkthrough during StanCon Meetings	6	1073	August 22, 2019
Paper on autodiff for implicit function Publicity	0	634	December 30, 2021
MXNet white paper "Auto-Differentiating Linear Algebra" Developers	9	1510	July 18, 2018

Autodiff expression graph optimizations

Related topics