While we all wait for tuples and sparse matrix support, I finally had a chance to think about non-language solutions to the problem of sparse model matrices and it turns out Stan has a pretty good one. Everything is wrapped up nicely here: https://github.com/sakrejda/hierarchy/

The motivating perspective is nicely described in a book by James Hodges and presents an awful lot of models as (mostly sparse) model matrices. Yey. In Stan you end up using these things as components for describing GLMâ€™s quite a bit so it would be nice to have first class sparse matrix support but thatâ€™s â€¦ ongoingâ€¦

You can hack sparse matrices with ragged arrayâ€¦ of course those are *not* in Stan either so you hack those too. Thereâ€™s an example script in a little package demo that you can install with `devtools::install_github("sakrejda/hierarchy")`

and run an example script:

```
script_file = system.file('quick-check/check-02.R', package = 'hierarchy')
source(script_file)
```

The upshot is that with a 10,000 x 200 matrix thatâ€™s about 2% non-zero you can easily write a linear model thatâ€™s twice as fast (and uses less memory) if you take

```
X * betas
```

and replace it with

```
for (i in 1:N)
mu[i] = sum(X_vec[starts[i]:stops[i]] .* betas[NZE[starts[i]:stops[i]]]);
```

Where `X_vec`

is the non-zero entries of `X`

, `NZE`

are the column-indices of the non-zero entries of X, and `i`

is the row index. The starts/stops here are where the `i`

'th row starts and stops in the dense vectors `X_vec`

and `NZE`

. This data stored here is similar to the CSR sparse matrix format.

In both cases here the `X`

/`X_vec`

can be either data, a parameter, or a transformed parameter. That means it can be used to construct measurement error models and other models where you need to have parameters as (some) entries of the `X`

matrix.

The package I reference above is one Iâ€™m putting some tools like this in but the function used here is quite short:

```
#' Create a sparse list model matrix
#'
#' @param formula formula to use
#' @param data where to look up terms
#' @return list with Stan-friendly components
#' @export
flat_mm = function(formula = 1, data = NULL, ...) {
mm = Matrix::sparse.model.matrix(f, data, ...)
nze = apply(mm, 1, function(x) which(x != 0))
N = length(nze)
stops = sapply(nze, length) %>% cumsum
starts = c(1, stops[1:(N-1)] + 1)
nze = unlist(nze)
X_vec = apply(mm, 1, function(x) x[x != 0])
mml = list(
N = N,
P = ncol(mm),
N_NZE = length(nze),
NZE = nze,
starts = starts, stops = stops,
X_vec = unlist(X_vec)
)
t = terms(formula)
pos = attr(t, 'response')
dn = all.vars(t)[pos]
mml[[dn]] = data[[dn]]
return(mml)
}
```

The version of sparse matrix multiplication we have implemented (in C++, sigh) is slower than this Stan-language version (good job on that language design @Bob_Carpenter et alâ€¦!). Oops that I didnâ€™t see this when I first coded the C++ versionâ€¦ it might make sense to replace thatâ€¦