An update for everyone: I have spoken with @betanalpha and the sparsity structure has to be fixed for any sparse matrix that is a parameter or transformed parameter. This is due to some perfectly sensible assumptions that are used extensively in both the algorithm and the I/O (basically that the state can be described as a vector of fixed length).
So what does this mean for sparse matrices? Well it makes them more difficult in the language than anything else (so I’ll tag @Bob_Carpenter here because he might have opinions)
Sparse matrices as data
I see two options for constructing sparse matrices in the data block.
Option 1: From a user interpretability standpoint, the following would probably be the easiest. It specifies a sparse matrix as an (i,j,value) triple.
data {
int n_row;
int n_col;
int nnz;
int i[nnz];
int j[nnz];
double value[nnz];
sparse_matrix<n_row, n_col, i, j, value> A;
}
or Option 2: This makes more sense from an “internals” standpoint. It would specify the matrix in compressed column storage
data {
int n_row;
int n_col;
int nnz;
int inner_index[nnz];
int outer_index[n_col];
double value[nnz];
sparse_matrix<n_row, n_col, inner_index, outer_index, value> A;
}
The difference between option 1 and option 2: Option 1 is human readable. Option 2 can be easily coerced in to an Eigen sparse matrix using
Eigen::Map<SparseMatrix<T> > A(n_row, n_col, nnz, outer_index, inner_index, value);
Option 2 is also easily produced by Python and R sparse matrix packages (it’s the default internal storage for sparse matrices in R. I am not sure about python, but it would not be surprising).
A small alternative: We could also do something like sparse_matrix<n_row, n_col, inner_index, outer_index> A(value);
A preferred option
data {
sparse_matrix_structure A_struct;
sparse_matrix<A_struct> A;
}
where it is assumed that all of the structure information is passed to the Stan call with derived names like A_struct_nnz__
and A_struct_value__
(general pattern of the derived name is [name of sparse_matrix_structure object]_[internal storage name]__
).
This would make passing a sparse matrix exactly like passing a dense matrix, which is the ideal from a user perspective. In particular, the user would never need to know about the internals of sparse matrix storage (which is ideal).
This is a higher burden on the I/O, but it does specify a sparse matrix as a series of integers and vectors with a fixed naming convention, which is the easiest way to flatten them out into a text dump.
This also allows users to perform algebraic manipulations on the sparsity structure directly, which broadly speaking seems like a very good idea!
Sparse matrices as transformed data
Here the sparsity structure can be inferred so something like
data {
sparse_matrix_structure A_struct;
sparse_matrix<A_struct> A;
sparse_matrix_structure B_struct;
sparse_matrix<B_struct> B;
}
transformed data {
sparse_matrix C = A*B;
sparse_matrix D = A + B;
sparse_matrix E = 2*A;
}
We will also want to manipulate sparse matrix structures directly (they also have an algebra). So things like this would be good.
data {
sparse_matrix_structure A_struct;
sparse_matrix_structure B_struct;
sparse_matrix<A_struct> A;
sparse_matrix<B_struct> B;
}
transformed data {
sparse_matrix_structure C_struct = A_struct*B_struct;
sparse_matrix_structure D_struct = A_struct + B_struct;
sparse_matrix E = A*B;
sparse_matrix_structure E_struct = get_structure(E);
}
where get_structure()
returns the structure of a sparse matrix.
Sparse matrices as parameters
Here we need to specify everything but the value, which will be mapped to var
s in the C++. An example would be
data {
int n_row;
int n_col;
int nnz;
int inner_index[nnz];
int outer_index[n_col];
double value[nnz];
}
parameters {
sparse_matrix<n_row, n_col, inner_index, outer_index> A;
}
or the ideal alternative:
data {
sparse_matrix_structure A_struct;
}
parameters {
sparse_matrix<A_struct> A;
}
where A_struct
tells Stan to expect vectors named A_struct_nnz__
and A_struct_inner_index__
exist in the input data file.
Sparse matrices as transformed parameters
This is the tricky one. These sparse matrices must have fixed sparsity structures. So we can have things like
data {
sparse_matrix_structure A_struct;
sparse_matrix_structure B_struct;
}
transformed data {
sparse_matrix_structure product_struct = A_struct*B_struct;
}
parameters {
sparse_matrix<A_struct> A;
sparse_matrix<B_struct> B;
}
transformed parameters {
sparse_matrix<product_struct> C;
C = A*B; // This is ok!
C = A + B; // This will throw an error if the sparsity structure of A+B isn't the same as A*B
}
Question: Should we allow declarations like sparse_matrix<A_struct * B_struct> C;
to avoid needing the transformed data
block?
Sparse Matrices as temporary variables in the model block
There is no requirement to know the sparsity pattern before hand here. So, just like in the transformed data
block, we can derive the sparsity structure on the fly. Something like this could work:
data {
sparse_matrix_structure A_struct;
sparse_matrix_structure B_struct;
}
parameters {
sparse_matrix<A_struct> A;
sparse_matrix<B_struct> B;
}
model {
sparse_matrix C = A*B; // structure is derived
}
Sparse matrices in generated quantities
Here we need to know the sparsity structure for anything that is outputted. So in that way it’s a lot like the parameters
blocks.
But! If there was a matrix that did not have a constant sparsity structure that was used as a temporary matrix, we may need some functional of that matrix to be generated. That means we would need temporary variables in the generated quantities
blocks to have a flexible sparsity structure, while the output in the block must have fixed sparsity. An example would be
generated quantities {
vector[n_col] output_variable;
{
sparse_matrix C = A*B;
output_variable = diagonal_of_inverse_sparse(C);
}
}
or alternatively (where the sparse matrix isn’t explicitly constructed)
generated quantities {
vector[n_col] output_variable;
output_variable = diagonal_of_inverse_sparse(A*B);
}
This might seem like a esoteric use case but it really isn’t. If the sparse matrix with the inconstant sparsity structure is used as a precision matrix of a Gaussian, the diagonal of its inverse are the marginal variances, which are things that people will typically want.
Did I forget anything?
So I think I accidentally wrote a language spec. What is missing?