# A proposal for sparse matrices and GPs in Stan

#21

That does make at least as much sense as stan/math/rev/mat/fun/multiply.hpp does. But in this case should I be using `precomputed_gradients()`?

#22

I was thinking that if `w` is data then the sparse matrix is static and it makes sense to only check it once in the `transformed data` block.

Itâs probably worth thinking of cheaper ways to check that the sparsity on the left and right-hand sides of expressions like `A_plus_B = A + B;` match without having to do those expensive pre-checks.

I mean you can. But the sparse matrix construction doesnât seem that expensive, so doing it twice isnât a killer.

#23

It seems simpler to do it with `precomputed_gradients` but apparently I am doing it wrong

``````stan::math::vector_v
csr_matrix_times_vector2(const int& m,
const int& n,
const Eigen::VectorXd& w,
const std::vector<int>& v,
const std::vector<int>& u,
const stan::math::vector_v& b,
std::ostream* pstream__) {
Eigen::Map<const Eigen::SparseMatrix<double,Eigen::RowMajor> >
sm(m, n, w.size(), &u[0], &v[0], &w[0]);
Eigen::VectorXd result = sm * value_of(b);
stan::math::vector_v out(m);
int pos = 0;
for (int k = 0; k < m; k++) {
int nnz = u[k + 1] - u[k];
if (nnz > 0) {
std::vector<stan::math::var> operands(nnz);
for (int j = 0; j < nnz; j++) {
operands[j] = b.coeff(v[j]);
}
}
else out.coeffRef(k) = 0.0;
}
return out;
}
``````

It seems that it is not chaining things together correctly because the gradients of the affected parameters come out too small:

`````` param idx           value           model     finite diff           error
0         1.60449         14.7638         14.7638     7.70049e-07
1         1.87157         -555.97         -555.97    -4.90843e-07
2         1.04997        -1.04997        0.934164        -1.98413
...   All parameters between 2 and 48 inclusive have the wrong derivatives
48         1.01874         722.119         663.832         58.2875
``````

#24

I donât think thatâs going to work. The precomputed gradient needs to know how to do the matrix-vector product, so it needs `u` and `v`.

#25

Maybe the Doxygen for `precomputed_gradients` is not correct?

``````    /**
* This function returns a var for an expression that has the
* specified value, vector of operands, and vector of partial
* derivatives of value with respect to the operands.
*
* @param[in] value The value of the resulting dependent variable.
* @param[in] operands operands.
* @param[in] gradients vector of partial derivatives of result with
* respect to operands.
* @return An auto-diff variable that uses the precomputed
*/
const std::vector<var>& operands,
}
``````

For the `k`-th element of the LHS vector, the partial derivatives are the non-zero elements of the matrix in the `k`-th row and the operands are the corresponding elements of `b`, right?

#26

From the autodiff paper, the key step in the `precomputed_gradients` implementation is `varis_[i]->adj_ += adj_ * gradients_[i];`

For this you need `varis_[i]->adj_ += A.transpose() * adj_` to be implemented, which this wonât do.

My code upthread should do, but I havennât had a chance to actually test it out.

#27

I think I understand your code, but the `A.transpose()` step is going to be on the expensive side.

But more generally why does the `precomputed_gradients` function exist if not to handle cases like this? The wiki page Bob wrote

recommends this way.

#28

Is there an in place transpose?

#29

Not for sparse matrices.

#30

I think the problem is that the gradient isnât `w` itâs the sparse matrix `(w,u,v)`, so youâre not passing it the right information.

Itâs possible that thereâs a clever way to use precomputed gradients when the gradient is neither a dense matrix or a vector, but iâm not sure how.

#31

But starting from triplets you could just compute the transpose once and store both sets of indexes. All you do is flip triplets and the storage/computation cost would scale with the number of non-zero entries.

#32

Actually, the problem is I am a dumbass. In my original code, this loop is indexed wrong

``````   for (int j = 0; j < nnz; j++) {
operands[j] = b.coeff(v[j]);  // should be b.coeff(v[pos]);
}
``````

Now, it is matching with the finite differences:

``````stan::math::vector_v
csr_matrix_times_vector2(const int& m,
const int& n,
const Eigen::VectorXd& w,
const std::vector<int>& v,
const std::vector<int>& u,
const stan::math::vector_v& b,
std::ostream* pstream__) {
Eigen::Map<const Eigen::SparseMatrix<double,Eigen::RowMajor> >
sm(m, n, w.size(), &u[0], &v[0], &w[0]);
Eigen::VectorXd result = sm * value_of(b);
stan::math::vector_v out(m);
int pos = 0;
for (int k = 0; k < m; k++) {
int nnz = u[k + 1] - u[k];
if (nnz > 0) {
std::vector<stan::math::var> operands(nnz);
for (int j = 0; j < nnz; j++) {
operands[j] = b.coeff(v[pos]);
}
}
else out.coeffRef(k) = 0.0;
}
return out;
}
``````

#33

I hate to disagree here, but youâre not. What I donât know is how my injunction not to insult people applies to self-insults!

Great news.

#34

Well usually itâs just self-flagellation followed by asking questions that could be solved by 5 minutes with the manual so Benâs case might be unique.

#35

For sparse assembly another option is hash-assembly. For reordering is it possible to use third-party graph partitioner such as metis? This would help MPI solver, especially if we want do domain decomposition later on.

My 2 centsâŚ

#36

Metis is a licensing nightmare, but other third-party partitioners are open source.