In the “things we should discuss” category for sparse matrices, we need to think about whether they should be stored in row or column major formats. Here’s what currently exists:

Row major:
– Stan math uses this in its one supported sparse matrix operation csr_matrix_times_vector
– My memory is that reverse autodiff for the cholesky used to use row-major ordering because @rtrangucci implemented Giles’ algorithm which happened to be coded that way (or that’s my view from the old doc). This has been superseded, but it may be a consideration when we put the sparse functionality in properly (including a var specialisation of sparse cholesky). That being said, there’s no real reason we can’t re-interpret Smith’s algorithm in row-major form (he didn’t write it in either direction) and there’s a moderately good chance it will be tricky to program regardless of the choice between row-major and column-major.

Column major:
– This is the default for Eigen.

So is there any compelling reason to choose one over the other, or has this just been a series of historical quirks?

I’m mainly worried about speed and usefulness into the future. For the moment, there’s no Stan sparse matrix type, so we’re always passing a CSR or CSC representation to a function. My concern is that they look very similar (both three vectors that usually have the same name) and it would be very easy to accidentally pass the wrong one. I can’t think of a way to check this (that’s why people encapsulate their sparse matrices!)

The entire reason for that function as far as my implementation went was that it could be used to represent a design matrix larger than fits in memory. It was CSR because we could implement the matrix-times-vector function as a dot product without any shuffling around and the argument was that it would make it faster… but I’m not sure that it did. We messed up a little because I profiled some early versions and they were faster beyond a certain size but the final implementation wasn’t (as @bgoodri later discovered).

It might be a waste of time to do much more here until we’re willing to commit to a sparse type that can encapsulate this. At the C++ level it would be straightfoward to have both (as in Eigen) but IIRC the real problem is making enough functions accept sparse types for them to be useful. @Bob_Carpenter: wasn’t this part of a grant at some point? Did that get funded?

The reason for doing this now is that @imadmali is putting a bunch of new models into rstanarm. For one class of models (ICAR models) it can be done fairly efficiently in the Stan language (using a method due to @mitzimorris), but for models with higher-order interactions, this is really inconvenient, and it’s much easier to just have a quadratic form v’Qv for a (data) sparse matrix Q. Imad tried implementing this using the existing csr_matrix_times_vector function but that was twice as slow as Mitzi’s version, which uses a well-deployed dot_self.

That makes sense. I propose we just bite the bullet and introduce sparse matrix types (csc_matrix and csr_matrix) which are compatible with limited operations. Otherwise we’re setting ourselves up for a mess of poorly structured code and complicated errors. I’d rather have users complain the sparse types are not useful for their pet project than complain about errors they don’t understand.

multi-dimensional array orientation is not performance-sensitive whereas lots of sparse algorithms benefit from one orientation or the other—that’s why libraries like Eigen don’t just pick up.

I’m starting to agree with @sakrejda that we really do need to bite the bullet and start building a sparse matrix class. Because a lot of this is going to force stuff onto users (even though the intended users are basically rstanarm, the experiment will escape the lab). Then we can have io that allows for both row and column major input and let Eigen deal with everything internally.

For the moment, when we aren’t doing anything that allows vars in the matrices, an io that takes the matrix is CSR, CSC, or (i,j,val) and then constructs the matrix in transformed_data doesn’t seem too bad to me. To some extent this layer could be removed in some of the interfaces (like rstan), but I can’t see how to do it conveniently in cmdstan (unless you want to read from a file).

In Stan, they’re row major, because they’re vector<vector<T> >. At least if you meant Stan arrays. But they’re a mess, because the rows aren’t even guaranteed to be contiguous with each other in memory and there’s a lot of pointer chasing.

You’re right that there’s no matrix issues here since they can’t be used in matrix ops.

But the nice thing about arrays to keep in mind is that there’s no copy required to return an element—it’s just a reference. Whereas in a matrix, returning a row or column requires a copy. So if you want to access a matrix only by row, it’s better to have an array of row vectors rather than a matrix.

What I meant is that with dense arrays all you need to change is the direction you iterate in. In sparse matrix ops certain operations are faster with CSC and others are faster with CSR. I can see that there’s some grey area for dense stuff as well.

It’s probably worth noting that this is a fools game for sparse matrices. If A is data, then evaluating y = A * v is fastest in CSR, but evaluating the reverse derivative \bar{v} = A^T*\bar{y} (to use Giles’ notation) is fastest in CSC.

Hey, if you know more than what the books say don’t hide it! :)

It sounds as though we could legitimately just choose since most Stan models involve a pretty large set of operations and any particular model is not likely to eke out an advantage by choosing among the two. That sounds great to me.

I’ve got a PhD in this :p (As Céline says: It’s all coming back to me now)

But what you did is exactly right for what you implemented. My job for when I get a short break is to see if dragging your code into eigen and specialising an autodiff function will speed it up. (My guts say it will on the derivatives only, because your code is basically what Eigen does for the evaluation).

The only small annoyance is that the current code allows vars in the matrix as well as in the vector. I’m not 100% sure how to do autodiff in this case (see my other question), but I guess I can just specialise to the current code.

I’m not sure if there’s a test rig for “Is branch A faster than branch B”.