Sparse matrix roadmap

I’d like to use the sparse matrix x dense vector product more but it would be nice to actually make it faster than generating a bajillion vars and in the current implementation it’s not. I’ve heard talk about a sparse matrix roadmap but I haven’t seen a document (other than: https://github.com/stan-dev/stan/wiki/Functional-Spec:-Sparse-Matrix-Data-Types). Is there a newer roadmap doc? I’d like to see what the current design considerations are.

The matrix-vector product is implemented already. Maybe in Rstan. (I think @bgoodri did it. At the very least there is a full implementation somewhere on the discourse).

I’m not sure what you’re looking for with a roadmap. If you’re thinking towards implementing a sparse matrix type there are a lot of serious design conversations to be had with the TWG because it doesn’t cleanly fit into the current language, math, or interface designs.

Actually I implemented it, in both stan-dev/math and in rstan.

Bob added some checks that I had missed and it ended up being slower than anybody wanted it to be. It’s still worth using big time to avoid memory limits for things like splines. The speed hit is a bummer b/c most of the speed hit comes from not having a direct sparse data type that can be checked once. I’ve brought up having a specific sparse data matrix that could be checked once but the general response was that the group wanted a wholistic solution rather than piecemeal development.

My question is whether there is more discussion written down somewhere specific about what direction the TWG wants to take the design. If I have to do a private branch with a sparse data matrix that’s fine but it would be nice to make it something that could contribute to a broader design in the future. I don’t think more discussion is useful right now b/c my horizon is a few weeks not a few years.

1 Like

You implemented the fast one? Because it shouldn’t make all those awful dot produc varis anymore? Ben’s implementation mapped to a Eigen::SparseMatrix and computed the derivatives directly.

I would say that for consistency with the rest of Stan everything should be done in column major format.

1 Like

AFAIK the code hasn’t changed but sometimes things happen and I don’t know. My understanding is that we still have the same problem and we were never making extra varis.

Here’s the main file I’m talking about, is there more or an alternative implementation somewhere? https://github.com/stan-dev/math/commits/develop/stan/math/prim/mat/fun/csr_matrix_times_vector.hpp

lol, this is why I ask these questions on discourse b/c not being at Columbia or some of the big meetings sometimes everything changes and I don’t notice till I look at the git history.

There is this in the rstanarm repo


but doing the autodiff was actually faster that doing precomputed_gradients.

That’s the slow implementation. There’s a good one on the forum. Currently on phone but you should be able to find it

Thanks for pointing that out, I recall the discussion now. That’s the implementation that based on previous discussion can’t be in stan-dev/math b/c it doesn’t meet the error-checking standards.

That’s the kind of situation I’d like to avoid, where add-on package code introduces features that can’t ever merge back and lets you write unsafe models. So it sounds like the answer to my question is that there’s no roadmap doc. for the sparse matrix types in Stan.

Error checking isn’t hard to add iirc

The error checking IS the speed hit given this input format, the algorithm in Ben’s code and in my code are very similar (I copy to the implicit smaller dot_product and take the custom gradient whereas Ben apparently squeezes some more performance by taking the auto-diff gradients and avoiding the copies).

With a sparse matrix data -only type you could do the checks once.

Sparse matrix class would require the TWG to make some decisions on

  • what a stan type needs to look like

  • how heavy our data types can be in the math library

  • what it’s supposed to do

  • how we can declare structure to be constant (or if we should?)

1 Like

I don’t think this requires the TWG, but yes, it’s good to get it involved. What this actually requires is a discussion / proposal about the points you mentioned. I don’t think we got far enough before.

what a stan type looks like

We currently have things like simplex and cov_matrix. What would an ideal sparse type look like? How would it be constructed? Putting down something we can look at and build off of would be very useful. Perhaps start with what’s natural, then we can talk about what technical challenges we have to implementing it that way.

Does this look like…

sparse_matrix[N, M, idx_r, idx_c, vals] A;

or something else?

how heavy our data types can be in the math library

The heaviness shouldn’t matter much. Right now, our containers are either std::vector<T> or Eigen::Matrix<T, R, C>. The only concrete proposal for something heavier was std::complex<T>. There’s no reason we can’t have anything heavier other than it’s work. That leads right to your next point…

what it’s supposed to do

That’s what we need to decide. Let’s say we could declare:

sparse_matrix[N, M] A;

then are we allowed to do stuff like matrix multiplication? What’s the output type? We’d need the checks to be extended to sparse_matrix, etc. It’s just a matter of what operations / functions are allowed for this new type.

We don’t have to roll it all out at once. Maybe it’s just one operation that works now with a list of others to implement.

how we can declare structure to be constant (or if we should?)

Wouldn’t it be the same way we deal with everything else? If it’s a type of primitive, it’s constant, if it’s of stan::math::var, it’s not?

The best way to think of sparse matrices is a structured data type (like ragged arrays) that comes with a set of algebraic operations that act on both the values and the structure (this is different to ragged arrays). So the number of vars might change even if the number of parameters does not.

Oh and @syclik, is there a skeleton for these roadmaps. Like the skeleton for a github issue or PR? It would be very useful for smoothing out this process.

I wish there were! But sadly no. There do exist wonderful examples put together by Bob when we discussed this type of thing. It should be on the old dev list archive.

But there’s also a good reason why something like this doesn’t have a template: it doesn’t happen often. Meaning, I don’t think we’ve added a new Stan type since the lkj stuff. In some sense, you’re all in uncharted territory and you just need to make a strong proposal. I’m sure that’ll be ok and then it’s the hard work of implementing it properly.

A set of targets would be good. WhT questions would we need to answer?

That’s super tough to answer before anyone even knows what you’re trying to do!

I don’t think anyone would be looking for a full specification, but writing out how it would affect users would really help. If you’re able to explain how it looks for users, the rest is implementation details. The first thing is to just describe, as clearly as possible, how users would work with it. As a decent guide, maybe look at the user guide and write doc at that level and it should suffice.

For example:

  • Are you trying to add a new Stan type? If so, what does the declaration look like and why?

  • Is there a way to construct it after declaration? Why or why not?

  • what operations need to be defined? Or what’s the minimal set of operations needed?

Btw, I should have mentioned that this isn’t a requirement. If you wrote a well speced issue and a set of pull requests that implemented it cleanly and it’s really easy to read, you don’t actually need to make a proposal at all! You’d still need the user-facing doc that I mentioned.

If you just wanted to implement it at the math library without it being part of the language, then it’s even easier. If you can fit within the API that’s already there, just show it’s better. If it’s different, but better, that’s great if it has good design, good coding, and doesn’t add too much technical debt. Of course, that’s much harder to do in a vacuum, but it can be done that way.