Sparse matrix roadmap

anon75146577 · September 20, 2018, 4:59pm

Is there an example? Maybe for Tuples, which I think is the last big feature that is “almost” in. Or MPI?

It is difficult to hit a moving target. (Although that’s basically what this thread will hopefully do - fix the target!)

I still do not know what var_context is. Is there a wiki page? Or an example?

A single design would work. Eigen::SparseMatrix can do most of the hard work for us in the math library. It needs to be column-major storage to be consistent with the rest of Stan (and also to avoid having to constantly overwrite eigen defaults). The reluctance to use a class to actually store anything substantial makes the design easier (although there will be an efficiency hit as we will have to do a lot of redundant calculations).

Casting sparse matrices to dense and dealing with mixed types shouldn’t be a problem. I think eigen can deal with mixed types, but if it can’t we can specialize. (I mean, we’ll have to do all the derivatives anyway, so we might as well do the prim specializations if they’re needed.)

We’d essentially pass 3 vectors of different lengths and 2 scalars to define a spare matrix foo. For the functional spec, there are basically two options:

Details hidden from user: Automatically generate consistent names so declaring sm foo expectsfoo_value__ (double or var vector), foo_inner__, foo_outer__ (int vectors), foo_nrow__, foo_ncol__ (ints) to be present. Obviously it would be nice to bundle these up in the interfaces (so pystan or RStan can just pass a sparse matrix in its internal format). In some sense, this is consistent with how python or R do sparse matrices.
Users need to know how sparse matrices are stored: They are responsible for manipulating these vectors directly and naming them.

A challenging challenge

(This is long and there is a question at the end. The tl;dr is that if the various Stan libraries can’t be content with the information that there is a sparse matrix then we might have some problems. Even needing to know how many non-zero elements there are will be problematic.)

If we have a fully operative sparse matrix type, we might need to change the sparsity structure “on the fly”. For instance, the transformed parameter block might concatenate two sparse matrices, or multiply them (which changes the sparsity structure). Now, the structure (and hence the number of vars stored in foo_value__) can be computed as transformed data as long as we only let people build sparse matrices of parameters from existing sparse matrices of parameters through addition, multiplication or concatenation…

A problem would occur if there was a branching statement where the two branches constructed sparse matrices with different sparsity structures. The below example is extreme (and probably not differentiable) but I can think of situations where this would be continuous.

transformed parameters {
  if (/*parameter dependent condition*/) {
    sm C = A*B;
  } else {
    sm C = A+B;
  }
}

Because the sparsity patterns of A*B and A+B can be very different, there’s no way to know from iteration to iteration how many vars would be in C_value__. If Stan thinks of the sparse matrix as being the vector C_value__ (+ side info) then this is a situation where we don’t know at run time how many parameters are in the model.

Mathematically C is one single object, so if you think of it abstractly we just have a lot of “phantom vars” that are always zero. So it doesn’t break HMC to have the sparsity structure of C change with parameters \theta as long as C(\theta) is a continuous function of \theta.

This suggests to me that a sparse matrix implementation should really be an encapsulated object so that Stan only thinks of it as one object that has a function to access nrow*ncol entries (even though most of these are zero). So a sparse matrix implementation should allow someone interrogating the output of a zero element to just get a chain of zeros.

The easiest way around this problem would be to just not let the user do things like this. But we do need to be able to add or multiply or block together sparse matrices. Otherwise sparse matrices will be very limited. And absent a linter, I can’t see how we can stop people writing that sort of branching code because locally everything is perfectly fine.

Question to ~~@Bob_Carpenter (as language guru)~~ @betanalpha because it’s only the algorithm that should care about this Is it possible for Stan to treat a sparse matrix type as a single atomic thing rather than as an array of scalar types? Because if it can then the stuff I just wrote will be fix-able. If it can’t, someone is going to need to work out what we can and can’t have. (Also @betanalpha because the adaptation will probably have some opinions about this…)

Topic		Replies	Views
Sparse matrices in Stan Developers	12	2549	February 8, 2018
A proposal for sparse matrices and GPs in Stan Developers	35	4829	January 18, 2018
Taking advantage of both sparse matrices and GPUs Algorithms	7	830	September 19, 2024
Gaussian process roadmap Developers gaussian-process	48	4453	April 12, 2019
Sparse Matrix Functional Spec Developers	0	558	October 19, 2016

Sparse matrix roadmap

Related topics