Help write an autodiff handbook

I’m writing here to ask for help in completing an introduction to autodiff with an encyclopedic coverage of forward- and reverse-mode tangent and adjoint rules. I put myself down as “editor”, but I’m going to manage the whole thing open source and give everyone who contributed credit so that it’ll be like one of those bio or physics papers if lots of people get involved.

People have been urging me to revise our autodiff paper for Stan and extend it to forward-mode. Stan’s forward mode is simple and generic, so instead, I decided to write a mini-textbook/encyclopedia on autodiff based on Mike Giles’s extended matrix autodiff paper. It’s a public repo on Git with code licensed under BSD:

But I need help completing the tangent and adjoint rules for further matrix results, diff eq solvers, algebraic equation solvers, 1D integrators, etc.—all the fun stuff we support in Stan.

The draft is written in bookdown and there are makefiles and build scripts or you can just go into RStudio, load bookdown, and type render(index.Rmd).

In addition to expanded derivations following Giles, there’s

  • intro to forward-mode
  • intro to reverse mode
  • intro to mixed mode
  • HMM derivatives (Qin, Auerbach, and Sachs 2000)
  • worked examples
  • working C++ reference code

The reverse-mode code is based on the continuation-based reverse-mode I defined as an exercise on Discourse a while ago. The forward-mode code and functionals are based on Stan’s math library.

Feedback is also welcome, either here, through GitHub issues, or via email at carp@alias-i.com.

8 Likes

I would suggest incorporating whatever is new in section 4 of

and at least giving a shoutout to

http://www.matrixcalculus.org/

1 Like

@charlesm93 wrote a nice paper on this! https://arxiv.org/abs/1811.05031

I’m happy to read stuff but I don’t think I’ve got much to add (except for the eye of someone who is bad with expression trees)

2 Likes

Thanks, @bgoodri. I’ll check it out. I hadn’t seen matrixcalculus.org—that looks super cool. It’d be great if we could work out the derivatives for our transforms and code those more efficiently.

I read the section of Dougal Maclaurin’s Ph.D. thesis on Autograd, which was interesting. They went with immutable matrices. I think we might be able to go down that route with Stan if we allow them to be built from something like a real(int, int) function and a pair of sizes (int, int).

@Daniel_Simpson: No expression graph yet. Just a lot of adjoint and tangent math. And a C++ reference implementation. I was hoping I could convince @charlesm93 to write the diff eq chapter.

1 Like

The www.matrixcalculus.org people also have http://www.geno-project.org/ where you can differentiate an objective function with respect to matrices and vectors using a simple language. That is less useful for the handbook but maybe more useful for stuffing everything into a function that evaluates a log-kernel.

2 Likes

I was hoping I could convince @charlesm93 to write the diff eq chapter.

Heck yeah, let’s talk!

I have some new stuff on differentiating the Laplace approximation that really nails down several important concepts: the utility of forward and reverse mode, the benefits of analytical “super nodes”, and a pretty sweat plot twist that involves finding the right initial cotangent. All this works also relates, to some degree, to discussions @seantalts and I had on differentiating algebraic solvers, and which I need to revisit.

@betanalpha, Vianey, and I have, as you know, things cooking for HMMs. This week, I’m writing it all up in Stan (based on some extensive C++ code produced by Michael). Michael and I – mostly Michael – rederived the ODE adjoint methods, which is better than what we currently have in Stan.

I recommend some of the discussion by @betanalpha on higher-order autodiff: https://arxiv.org/pdf/1812.11592.pdf.

2 Likes

The specific bookdown requires a bunch of extra packages so it doesn’t render out of the box. A requirements listing would be helpful if you’re not going to push the html and pdf to the repo for each access.

In any case I highly recommend taking the introduction strategy that @charlesm93 used in his review, https://arxiv.org/pdf/1811.05031.pdf. Getting passed the “autodiff is just implementing the chain rule” to “autodiff propagates differential information through functions” better prepares the reader for autodiff architectures and more subtle concepts like the need to implementing only Jacobian-vector products instead of full Jacobians.

1 Like

Expression graphs could be done for example using the diagram package (maybe you know better options but it seems good to me). Here is how to do one for the log(uv) example currently in section 4.1

library(diagram)
A <- matrix(0, 5, 5)
colnames(A) <- c("log", "*", "e", "u", "v")
A[2:3,1] <- 1
A[4:5,2] <- 1
pos <- matrix(c(0.6, 0.4, 0.8, 0.2, 0.6, 0.8, 0.5, 0.5, 0.2, 0.2), 5, 2, byrow = FALSE)
plotmat(A, curve = 0, arr.type = "triangle", arr.width = 0.5, 
    box.lcol = c("firebrick", "black", "black", "steelblue", "steelblue"), 
    cex.txt = 0, pos = pos, box.size = 0.06)

It renders this:

2 Likes