Interpreting the reverse-mode autodiff

This is a fairly basic thing, but I can’t quite extract it from Giles or the StanMath paper tonight.

If I know that y[j] = 0 structurally (ie independent of context. It’s just true), does that mean `\bar{y}[j] = 0’?

For context, if A is a sparse matrix of and I want to compute y = A*x for vectors x and y, then is \bar{A} sparse?

In the dense case \bar{A} = \bar{y} * c' which is a dense matrix (albeit a low rank one). This would be inconvenient for me. I would prefer \bar{A} to have the same sparsity as A. But maths doesn’t care about my needs.

It matters what’s parameters and what’s data.

Suppose I have two vectors x and y and they’re both parameters. Then

d.(x' * y) / d.x[1] =  y[1]

so what you really need to know is the sparsity of y[1].

In your case, the sparsity of x is what will determine the sparsity d.(A * x) / d.A.

So does that mean that if y=Ac even though A is sparse \bar{A} is dense if c is dense?!

That’s very inconvenient. The storage without additional data structures would be n^2 vars if (A is nxn), while the minimal storage (the number of unique vars that need to be stored to represent \bar{A}) is n.

Sorry—I don’t have the power to change how derivatives work!

d.(u * v) / d.u = v, even if u = 0.

1 Like

Syntax Q – what’s \bar{x} mean here?

It’s Giles’ notation of reverse autodiff

1 Like