Save this file as speed.stan, which reimplements what csr_matrix_times_vector
does but without as much checking:
functions {
vector slow(int m, int n, vector w, int[] v, int[] u, vector b) {
return csr_matrix_times_vector(m, n, w, v, u, b);
}
vector fast(int m, int n, vector w, int[] v, int[] u, vector b) {
vector[m] out;
int pos = 1;
for (k in 1:m) {
int nnz = u[k + 1] - u[k];
if (nnz > 0) {
vector[nnz] left;
vector[nnz] right;
for (j in 1:nnz) {
right[j] = b[v[pos]];
left[j] = w[pos];
pos = pos + 1;
}
out[k] = dot_product(left, right);
}
else out[k] = 0;
}
return out;
}
}
In R, do
rstan::expose_stan_functions("speed.stan")
K <- 1000L
X <- matrix(rbinom(K^2, size = 1, prob = 0.25), nrow = K, ncol = K)
b <- rnorm(K)
parts <- rstan::extract_sparse_parts(X)
all.equal(with(parts, slow(m = nrow(X), n = ncol(X), w, v, u, b)),
with(parts, fast(m = K, n = K, w, v, u, b))) # TRUE
Now do, in R
slow_time <- system.time(replicate(1000, dim(with(parts, slow(m = K, n = K, w, v, u, b)))))
fast_time <- system.time(replicate(1000, dim(with(parts, fast(m = K, n = K, w, v, u, b)))))
The “slow” version with csr_matrix_times_vector
is over three times slower than the “fast” version that avoids some of the checking. The “fast” version is still checking that all the indices are in bounds.
It isn’t so much that checking is not a worthwhile idea (although it might be for expensive checks), it is that checking doubles and ints every leapfrog step is not worthwhile after the first evaluation. Also, there is currently no way to turn all the checks off for packages like rstanarm that come with compiled models and their own units tests. And even if we refactored things so that was possible, it would be difficult to turn expensive checks off and leave cheap checks on.