Should sparse matrices in Stan be row major or column major

So MPI works because when A is row-major the for loop in A*v is embarrassingly parallelisable. A’*v is paralleisable only in col-major form.

Obviously when A is symmetric, they’re both good.

One option may be to store both A and A’ in row major form. It really depends on what the speedup is. If we have a stan_sparse class we can play with all of this away from “prying eyes”

So its like dictionary of keys (dok) format and then translated to csr/csc.

Is there a speed difference between dok and csr/csc. We could use the dok format for sparse matrices and with SSD the random access should be O(1)? dok will take more space, but will MCMC limits come before RAM limits? Is the extra saved space really needed.

Sure, the dok could be translated to row / column major without compression. (LIL / COO).

I don’t see any reason not to use Eigen’s built in type (which has it’s own internal storage that we don’t see for very good reasons). The constructors are dok (standard - implemented as a vector of triplets), and CSC/CSR through Eigen::Map< Eigen::SparseMatrix >.

1 Like

Ok, that’s a slight lie. Once we have a working version in Eigen, it would probably be worth exploring some more scalable options (like petsc, which is a very efficient but VERY ugly C MPI library for linear algebra that lives under all of the “let’s buy a bigger supercomputer” calculations out there). But ideally this would be abstracted FAR away from the user.

Is there a speed difference between dok and csr/csc.

Yeah, the dok format is just for building. You wouldn’t use it for actual calculations.

Where are you proposing adding hashmaps? They provide good random access, but are horribly slow at iterating, which is what you need to do in arithmetic—iterate over the sparse then direct access the dense. Still horrible compared to dense-dense.

Another constuction alterative is (i, j, x[i,j]) triples or three parallel arrays. Eigen does that.

We’re rather locked into Eigen because it’s the only lib that takes the templating seriously enough that we can just plug in Stan types. So unless you want to do purely analytic derivatives, it pretty much has to be Eigen.

I have a Wiki page somewhere outlining what would be needed to have the typing for that. It’s going to get ugly. But it’s doable. Once Mitzi gets tuples in, we should have a way open to add more data types easily. Right now, the basic type inference is a hurdle.

Then, the issue is whether we support all the operations on them. Do we add sparse addition, multiplication, etc.? That’s a lot of work, but if any of the basic operations are missing, it’s going to be horribly confusing for users. We’d probably also need sparse vectors and row vectors. Then what about the operations that mix dense and sparse?

This is still quadratic, so doesn’t scale to high sparsity situations. We probably want operatiosn to convert both ways whatever we do if we add a sparse matrix type.

Those are different types. They have different algorithms to construct them and use them.

If you’re going to have a covariance matrix parameter, you need to use it, or the Cholesky factor type for a covariance matrix parameter. Enforcing it is simple — we just use a Cholesky-factor unconstrained representation, which is lower triangular with strictly positive diagonal. So there’s no validation if it’s a parameter. Now if it’s a transformed parameter or generated quantity, those constriants are just used for constraint validation.

A lot of it’s new. They’ve been pretty great at defining C++ interfaces up until now.

I think users should be informed about the storage format because they need that information to write efficient Stan programs. This is the situation with matrices—users don’t get a choice, they get column-major matrices and we (really Eigen) just have to transpose internally where necessary.

Agreed. Symmetric matrix itself would be nice. Eigen does this with views of one triangle of the matrix, but it still allocates N^2 storage for an N x N symmetric matrix. (More tradeoffs in indexing speed vs. storage compactness.)

“anethema” might be a bit strong. Certainly it would be a pain to figure out because we’d have to preocmpute the sparsity pattern in the result if we want a sized declaration.

We can call better algorithms with this kind of knowledge. Now we have to force users to call specific algorithms for specific types of inputs.

The real problem we have is that there are two situations in which multi_normal is called. In one, you have done something like a GP and monkeyed with a matrix yourself. That should be checked. In another, you’ve declared a covariance matrix parameter, so you already know it’s positive definite. I don’t think there’s a performance penalty on calling multi_normal, though, because it has to do the factoring anyway and can just report if it goes off the rails.

The trick is to choose an internal representation. Then supply flexible ways to construct instances.

It’s true that the total number of vari instances on the stack isn’t fixed, but the number that represents parameters is.

OK, all the Bens are clearly on one side of this debate and everyone else is on the other. You’re going to need some more Bens to prevail!

Yes, Stan enforces any declared contraint. If you don’t want the check, don’t declare it with a constraint.

The issue you’re pointing out from the manual is that there’s a distinction between constrained type and runtime type, the former informing transforms for parameters and error checking for the other blocks.

But that won’t work with data coming in from the outside. It should work internally, but you won’t be able to declare sparse_matrix[N, N] because it needs to know its sparsity structure. We could use something like a builder pattern with this to store up a collection of tuples then build all at once, but incrementally is really difficult because after each assignment, we still need to have a sparse matrix here.

The easier way to do this is the way Eigen does with (i, j, x[i,j]) triples. Very easy not to mess up and not much overhead if you’re only using it in construction.

Absolutely. I think everyone’s on the same page with this. That is what guided the current way @sakrejda did the sparse matrix-vector multiply we have now.

Right—easy to convert at construction stage.

This is what Eigen tries to do internally with things like transpose—decide whether the memory allocation is worth it for transposing—it will be in matrix-matrix multiply for reasonably big matrices.

Well, the natural format will be fastest. But construction is likely to be dominated by use later. And if we’re talking about transformed data, the construction cost is constant, not per iteration.

You’re going to need some more Bens to prevail!
The easier way to do this is the way Eigen does with (i, j, x[i,j]) triples.

Yeah, I guess the way I wrote that constructor is kinda bad cause of the quadratic thing. Working with the i_s, j_s, v_s three array format seems better overall I guess. It just seems really simple to do and nice. Mind changed!

Thank Eigen—it’s not the most efficient, but probably has the best efficiency vs. ease of use tradeoffs for matrices with a lot of sparsity.