There’s been ongoing demand for more efficient matrix structures for reverse-mode autodiff in Stan. I’ve been thinking about this for years and @rtrangucci and @anon75146577 have renewed the plea.
Recap of where we’re at
To recap, the current matrix type in reverse-mode is Matrix<var, -1, -1>
. This is an Eigen structure that holds an array of M x N
values with in column-major order along with the row size and column size. This makes it easy to get and set entries in matrices. It also makes it very hard to grab the underlying double matrix corresponding to the values of the variables and the underlying double matrix corresponding to the adjoints, which we need for efficient matrix autodiff. As is, we wind up copying to and from those structures with every operation. That’s terribly inefficient and we’d like to do better.
The matrix structure we need for autodiff
The fundamental obstacle is that we want to have a structure like this:
struct matrix_vari {
double* val_; // use Map<MatrixXd>(val_) to use as matrix
double* adj_;
};
and then our usual pointer-to-implementation
struct matrix_var {
matrix_vari* vi_;
};
and we’d manage all the memory in our arena like other autodiff variables. Easy, right? Yes, but how do we get and set values? The getting is actually pretty easy. Setting is pretty much impossible with this structure.
Setting and getting in reverse-mode autodiff
The design of reverse-mode autodiff needs to ensure that whatever is pointed to at one point doesn’t change values. If you want some new type, you create a new node in the expression graph. So even when you see:
var x = 5;
x = x^2;
what is going on is that the first statement declares and creates a a vari
for the constant 5 (i.e., a new “independent variable”). At this point, the var
instance x
points to this vari
. Then when we execute the second statement, we create a brand new vari
for the square of the original vari
. This new vari
is then the value of the pointer in x
. But the original vari
never changes—it’s always the constant 5
.
Getting
I think the getting is doable with a little extra memory overhead that essentially creates a new vari
for every get operation with a unit partial linked back to the original matrix. That maintains integrity of the vari
everywhere with a bit of overhead.
Setting with the new matrix structure
We’d like to be able to do this:
matrix_var x(2, 3);
...
x[1, 1] = 5; // ???
But if we want to maintain the integrity of our original matrix, then we need to create an entire copy. That’s a ridiculous amount of overhead to do things like create an ad-hoc matrix (like for GPs).
A possible solution
How about a static matrix type that simply doesn’t allow setting? We can have a function to create one from an ordinary matrix or array:
matrix[4, 5] a;
for (i in 1:4) for (j in 1:5) a[i, j] = ...;
fixed_matrix[4, 5] a_st = a;
Then there’s a single copy and all subsequent matrix operations with a_st
are going to be efficient and local. We could declare parameters or data to be static_matrix
. And we could have things like our covariance functions for GPs or our Cholesky factorizations return static matrices.
For cases like this, it’d probably be a little less efficient than what we have now which does the fixed matrix construction once internally and wouldn’t require any autodiff redirection. The fixed_matrix
is going to need to store pointers to all of the original vari
out of which it was constructed and then push autodiff along the links with unit partials.