This was always doable and will be done once I finish this grant application. (Sorry I didn’t make that clear) But this is the minimal use of sparse matrices. We still need to have the language discussion if we’re going to manipulate them in-language, speed up lmer, or make them user friendly.

You could, but a lot of the time spent in csr_matrix_times_vector is on checking validity of v and u, which don’t change. So, rstanarm needs a version without all that. If there were a way to construct a sparse_matrix in data or transformed data, then in each leapfrog step, you could just check that the matrix was conformable before multiplying by a vector.

w is actually not checked except for its size being the same as that of v. But it checks that every element of v — which is already restricted to be an int[] is in the allowable range.

I was thinking that if w is data then the sparse matrix is static and it makes sense to only check it once in the transformed data block.

It’s probably worth thinking of cheaper ways to check that the sparsity on the left and right-hand sides of expressions like A_plus_B = A + B; match without having to do those expensive pre-checks.

I mean you can. But the sparse matrix construction doesn’t seem that expensive, so doing it twice isn’t a killer.

Maybe the Doxygen for precomputed_gradients is not correct?

/**
* This function returns a var for an expression that has the
* specified value, vector of operands, and vector of partial
* derivatives of value with respect to the operands.
*
* @param[in] value The value of the resulting dependent variable.
* @param[in] operands operands.
* @param[in] gradients vector of partial derivatives of result with
* respect to operands.
* @return An auto-diff variable that uses the precomputed
* gradients provided.
*/
inline var precomputed_gradients(double value,
const std::vector<var>& operands,
const std::vector<double>& gradients) {
return var(new precomputed_gradients_vari(value, operands, gradients));
}

For the k-th element of the LHS vector, the partial derivatives are the non-zero elements of the matrix in the k-th row and the operands are the corresponding elements of b, right?

But starting from triplets you could just compute the transpose once and store both sets of indexes. All you do is flip triplets and the storage/computation cost would scale with the number of non-zero entries.

For sparse assembly another option is hash-assembly. For reordering is it possible to use third-party graph partitioner such as metis? This would help MPI solver, especially if we want do domain decomposition later on.