I just speeded up a model by a factor of 10x by replacing a matrix-vector product with a loop which only processes the non-zero elements of the matrix. The matrix was a design matrix (thus the matrix was data for Stan) for the random effects part of a model. I assume that this is the same situation for rstanarm.
So it seems to be a major problem as the AD stack gets inflated by all those zero times whatever product.
Do we have a good solution for the moment or on the horizon? looping will do, but writing matrix expressions is a lot more expressive for the model.
Sorry if this was discussed already and I missed it.
There’s a fix in R-Stan (@bgoodri did it) and I can write one into math if it’s needed (the derivatives are not hard).
How much of a pain would it be to swap your code to column-major? Because that’s more useful for other sparse operations…
Thanks @sakrejda for the link.
I recall @bgoodri did come up with some proposals…is that documented somewhere?
Doing something about these sparse matrices would be a huge win. Like I say, I got a problem to sample 10x as fast and the matrices were not really large, but with massive amounts of 0s.
So, yes, if you have stuff for stan math then we should consider it seriously to bring it into math.
The original reason the code was made as slow as it is was opposition to giving users a foot-gun so adding Ben’s version to math would involve re-hashing that. Once tuples are in I think we could build a type on top of them that would carry the sparse matrix component vectors and be constructed in transformed data then we could officially skip all the error-checking.
Hmmm… i recall. Have we considered adding an explicit unsafe version? If the unsafe is made clear in the function name, then one could possibly skip error checking, no?
Has that idea been discussed? I don‘t think it is ideal to have critical speed fixes in rstanarm alone and tuples are still some bit in the future as I understood.
It wasn’t a huge change in speed. It might have actually been slower in doubles, but we gained a bit more speed by relying on the autodiff instead of
Yes. That’s why I always tell people not to do that and why we have a sparse matrix-vector multiply function.
Getting sparse matrices is backed up behind tuples, functions, and ragged arrays. So I doubt it’s going to happen in 2018. We could reprioritize if someone has a workable sparse matrix/vector proposal—we are currently stuck without a design.
When you say proposal what exactly do you mean? Is there an example for, say, ragged arrays (or MPI or ODEs or something else recent)?
Ragged arrays: https://github.com/stan-dev/stan/wiki/Ragged-array-spec
It’s not particularly coherent because I just pasted a new proposal in front of the old one (everything before “what is a ragged array?”). You’ll also see that what we’ll have to do to declare these things in general will be horrendous.
This seems like a fine state of things b/c between tuples and ragged arrays you get many of the tools you need to implement sparse matrices anyway…
I took this opportunity to flesh out the
to match our current thinking. I could use some feedback on the actual syntax for tuple types and expressions.
On the other hand, the
is very much a work in process.