That does make at least as much sense as stan/math/rev/mat/fun/multiply.hpp does. But in this case should I be using `precomputed_gradients()`

?

# A proposal for sparse matrices and GPs in Stan

**bgoodri**#21

**Daniel_Simpson**#22

I was thinking that if `w`

is data then the sparse matrix is static and it makes sense to only check it once in the `transformed data`

block.

Itâs probably worth thinking of cheaper ways to check that the sparsity on the left and right-hand sides of expressions like `A_plus_B = A + B;`

match without having to do those expensive pre-checks.

I mean you can. But the sparse matrix construction doesnât seem that expensive, so doing it twice isnât a killer.

**bgoodri**#23

It seems simpler to do it with `precomputed_gradients`

but apparently I am doing it wrong

```
stan::math::vector_v
csr_matrix_times_vector2(const int& m,
const int& n,
const Eigen::VectorXd& w,
const std::vector<int>& v,
const std::vector<int>& u,
const stan::math::vector_v& b,
std::ostream* pstream__) {
Eigen::Map<const Eigen::SparseMatrix<double,Eigen::RowMajor> >
sm(m, n, w.size(), &u[0], &v[0], &w[0]);
Eigen::VectorXd result = sm * value_of(b);
stan::math::vector_v out(m);
int pos = 0;
for (int k = 0; k < m; k++) {
int nnz = u[k + 1] - u[k];
if (nnz > 0) {
std::vector<stan::math::var> operands(nnz);
std::vector<double> gradients(nnz);
for (int j = 0; j < nnz; j++) {
operands[j] = b.coeff(v[j]);
gradients[j] = w.coeff(pos++);
}
out.coeffRef(k) = precomputed_gradients(result.coeff(k),
operands, gradients);
}
else out.coeffRef(k) = 0.0;
}
return out;
}
```

It seems that it is not chaining things together correctly because the gradients of the affected parameters come out too small:

```
param idx value model finite diff error
0 1.60449 14.7638 14.7638 7.70049e-07
1 1.87157 -555.97 -555.97 -4.90843e-07
2 1.04997 -1.04997 0.934164 -1.98413
... All parameters between 2 and 48 inclusive have the wrong derivatives
48 1.01874 722.119 663.832 58.2875
```

**Daniel_Simpson**#24

I donât think thatâs going to work. The precomputed gradient needs to know how to do the matrix-vector product, so it needs `u`

and `v`

.

**bgoodri**#25

Maybe the Doxygen for `precomputed_gradients`

is not correct?

```
/**
* This function returns a var for an expression that has the
* specified value, vector of operands, and vector of partial
* derivatives of value with respect to the operands.
*
* @param[in] value The value of the resulting dependent variable.
* @param[in] operands operands.
* @param[in] gradients vector of partial derivatives of result with
* respect to operands.
* @return An auto-diff variable that uses the precomputed
* gradients provided.
*/
inline var precomputed_gradients(double value,
const std::vector<var>& operands,
const std::vector<double>& gradients) {
return var(new precomputed_gradients_vari(value, operands, gradients));
}
```

For the `k`

-th element of the LHS vector, the partial derivatives are the non-zero elements of the matrix in the `k`

-th row and the operands are the corresponding elements of `b`

, right?

**Daniel_Simpson**#26

From the autodiff paper, the key step in the `precomputed_gradients`

implementation is `varis_[i]->adj_ += adj_ * gradients_[i];`

For this you need `varis_[i]->adj_ += A.transpose() * adj_`

to be implemented, which this wonât do.

My code upthread should do, but I havennât had a chance to actually test it out.

**bgoodri**#27

I think I understand your code, but the `A.transpose()`

step is going to be on the expensive side.

But more generally why does the `precomputed_gradients`

function exist if not to handle cases like this? The wiki page Bob wrote

recommends this way.

**Daniel_Simpson**#30

I think the problem is that the gradient isnât `w`

itâs the sparse matrix `(w,u,v)`

, so youâre not passing it the right information.

Itâs possible that thereâs a clever way to use precomputed gradients when the gradient is neither a dense matrix or a vector, but iâm not sure how.

**sakrejda**#31

But starting from triplets you could just compute the transpose once and store both sets of indexes. All you do is flip triplets and the storage/computation cost would scale with the number of non-zero entries.

**bgoodri**#32

Actually, the problem is I am a dumbass. In my original code, this loop is indexed wrong

```
for (int j = 0; j < nnz; j++) {
operands[j] = b.coeff(v[j]); // should be b.coeff(v[pos]);
gradients[j] = w.coeff(pos++);
}
```

Now, it is matching with the finite differences:

```
stan::math::vector_v
csr_matrix_times_vector2(const int& m,
const int& n,
const Eigen::VectorXd& w,
const std::vector<int>& v,
const std::vector<int>& u,
const stan::math::vector_v& b,
std::ostream* pstream__) {
Eigen::Map<const Eigen::SparseMatrix<double,Eigen::RowMajor> >
sm(m, n, w.size(), &u[0], &v[0], &w[0]);
Eigen::VectorXd result = sm * value_of(b);
stan::math::vector_v out(m);
int pos = 0;
for (int k = 0; k < m; k++) {
int nnz = u[k + 1] - u[k];
if (nnz > 0) {
std::vector<stan::math::var> operands(nnz);
std::vector<double> gradients(nnz);
for (int j = 0; j < nnz; j++) {
operands[j] = b.coeff(v[pos]);
gradients[j] = w.coeff(pos++);
}
out.coeffRef(k) = precomputed_gradients(result.coeff(k),
operands, gradients);
}
else out.coeffRef(k) = 0.0;
}
return out;
}
```

**Bob_Carpenter**#33

I hate to disagree here, but youâre not. What I donât know is how my injunction not to insult people applies to self-insults!

Great news.

**sakrejda**#34

Well usually itâs just self-flagellation followed by asking questions that could be solved by 5 minutes with the manual so Benâs case might be unique.

**yizhang**#35

For sparse assembly another option is hash-assembly. For reordering is it possible to use third-party graph partitioner such as metis? This would help MPI solver, especially if we want do domain decomposition later on.

My 2 centsâŚ

**Daniel_Simpson**#36

Metis is a licensing nightmare, but other third-party partitioners are open source.