Is there an example? Maybe for Tuples, which I think is the last big feature that is “almost” in. Or MPI?
It is difficult to hit a moving target. (Although that’s basically what this thread will hopefully do - fix the target!)
I still do not know what var_context
is. Is there a wiki page? Or an example?
A single design would work. Eigen::SparseMatrix
can do most of the hard work for us in the math
library. It needs to be column-major storage to be consistent with the rest of Stan (and also to avoid having to constantly overwrite eigen defaults). The reluctance to use a class to actually store anything substantial makes the design easier (although there will be an efficiency hit as we will have to do a lot of redundant calculations).
Casting sparse matrices to dense and dealing with mixed types shouldn’t be a problem. I think eigen can deal with mixed types, but if it can’t we can specialize. (I mean, we’ll have to do all the derivatives anyway, so we might as well do the prim
specializations if they’re needed.)
We’d essentially pass 3 vectors of different lengths and 2 scalars to define a spare matrix foo
. For the functional spec, there are basically two options:
- Details hidden from user: Automatically generate consistent names so declaring
sm foo
expectsfoo_value__
(double
orvar
vector),foo_inner__
,foo_outer__
(int
vectors),foo_nrow__
,foo_ncol__
(int
s) to be present. Obviously it would be nice to bundle these up in the interfaces (so pystan or RStan can just pass a sparse matrix in its internal format). In some sense, this is consistent with how python or R do sparse matrices. - Users need to know how sparse matrices are stored: They are responsible for manipulating these vectors directly and naming them.
A challenging challenge
(This is long and there is a question at the end. The tl;dr is that if the various Stan libraries can’t be content with the information that there is a sparse matrix then we might have some problems. Even needing to know how many non-zero elements there are will be problematic.)
If we have a fully operative sparse matrix type, we might need to change the sparsity structure “on the fly”. For instance, the transformed parameter block might concatenate two sparse matrices, or multiply them (which changes the sparsity structure). Now, the structure (and hence the number of var
s stored in foo_value__
) can be computed as transformed data
as long as we only let people build sparse matrices of parameters from existing sparse matrices of parameters through addition, multiplication or concatenation…
A problem would occur if there was a branching statement where the two branches constructed sparse matrices with different sparsity structures. The below example is extreme (and probably not differentiable) but I can think of situations where this would be continuous.
transformed parameters {
if (/*parameter dependent condition*/) {
sm C = A*B;
} else {
sm C = A+B;
}
}
Because the sparsity patterns of A*B
and A+B
can be very different, there’s no way to know from iteration to iteration how many var
s would be in C_value__
. If Stan thinks of the sparse matrix as being the vector C_value__
(+ side info) then this is a situation where we don’t know at run time how many parameters are in the model.
Mathematically C
is one single object, so if you think of it abstractly we just have a lot of “phantom var
s” that are always zero. So it doesn’t break HMC to have the sparsity structure of C change with parameters \theta as long as C(\theta) is a continuous function of \theta.
This suggests to me that a sparse matrix implementation should really be an encapsulated object so that Stan only thinks of it as one object that has a function to access nrow*ncol
entries (even though most of these are zero). So a sparse matrix implementation should allow someone interrogating the output of a zero element to just get a chain of zeros.
The easiest way around this problem would be to just not let the user do things like this. But we do need to be able to add or multiply or block together sparse matrices. Otherwise sparse matrices will be very limited. And absent a linter, I can’t see how we can stop people writing that sort of branching code because locally everything is perfectly fine.
Question to @Bob_Carpenter (as language guru) @betanalpha because it’s only the algorithm that should care about this Is it possible for Stan to treat a sparse matrix type as a single atomic thing rather than as an array of scalar types? Because if it can then the stuff I just wrote will be fix-able. If it can’t, someone is going to need to work out what we can and can’t have. (Also @betanalpha because the adaptation will probably have some opinions about this…)