That does make at least as much sense as stan/math/rev/mat/fun/multiply.hpp does. But in this case should I be using precomputed_gradients()
?
I was thinking that if w
is data then the sparse matrix is static and it makes sense to only check it once in the transformed data
block.
Itâs probably worth thinking of cheaper ways to check that the sparsity on the left and right-hand sides of expressions like A_plus_B = A + B;
match without having to do those expensive pre-checks.
I mean you can. But the sparse matrix construction doesnât seem that expensive, so doing it twice isnât a killer.
It seems simpler to do it with precomputed_gradients
but apparently I am doing it wrong
stan::math::vector_v
csr_matrix_times_vector2(const int& m,
const int& n,
const Eigen::VectorXd& w,
const std::vector<int>& v,
const std::vector<int>& u,
const stan::math::vector_v& b,
std::ostream* pstream__) {
Eigen::Map<const Eigen::SparseMatrix<double,Eigen::RowMajor> >
sm(m, n, w.size(), &u[0], &v[0], &w[0]);
Eigen::VectorXd result = sm * value_of(b);
stan::math::vector_v out(m);
int pos = 0;
for (int k = 0; k < m; k++) {
int nnz = u[k + 1] - u[k];
if (nnz > 0) {
std::vector<stan::math::var> operands(nnz);
std::vector<double> gradients(nnz);
for (int j = 0; j < nnz; j++) {
operands[j] = b.coeff(v[j]);
gradients[j] = w.coeff(pos++);
}
out.coeffRef(k) = precomputed_gradients(result.coeff(k),
operands, gradients);
}
else out.coeffRef(k) = 0.0;
}
return out;
}
It seems that it is not chaining things together correctly because the gradients of the affected parameters come out too small:
param idx value model finite diff error
0 1.60449 14.7638 14.7638 7.70049e-07
1 1.87157 -555.97 -555.97 -4.90843e-07
2 1.04997 -1.04997 0.934164 -1.98413
... All parameters between 2 and 48 inclusive have the wrong derivatives
48 1.01874 722.119 663.832 58.2875
I donât think thatâs going to work. The precomputed gradient needs to know how to do the matrix-vector product, so it needs u
and v
.
Maybe the Doxygen for precomputed_gradients
is not correct?
/**
* This function returns a var for an expression that has the
* specified value, vector of operands, and vector of partial
* derivatives of value with respect to the operands.
*
* @param[in] value The value of the resulting dependent variable.
* @param[in] operands operands.
* @param[in] gradients vector of partial derivatives of result with
* respect to operands.
* @return An auto-diff variable that uses the precomputed
* gradients provided.
*/
inline var precomputed_gradients(double value,
const std::vector<var>& operands,
const std::vector<double>& gradients) {
return var(new precomputed_gradients_vari(value, operands, gradients));
}
For the k
-th element of the LHS vector, the partial derivatives are the non-zero elements of the matrix in the k
-th row and the operands are the corresponding elements of b
, right?
From the autodiff paper, the key step in the precomputed_gradients
implementation is varis_[i]->adj_ += adj_ * gradients_[i];
For this you need varis_[i]->adj_ += A.transpose() * adj_
to be implemented, which this wonât do.
My code upthread should do, but I havennât had a chance to actually test it out.
I think I understand your code, but the A.transpose()
step is going to be on the expensive side.
But more generally why does the precomputed_gradients
function exist if not to handle cases like this? The wiki page Bob wrote
recommends this way.
Is there an in place transpose?
Not for sparse matrices.
I think the problem is that the gradient isnât w
itâs the sparse matrix (w,u,v)
, so youâre not passing it the right information.
Itâs possible that thereâs a clever way to use precomputed gradients when the gradient is neither a dense matrix or a vector, but iâm not sure how.
But starting from triplets you could just compute the transpose once and store both sets of indexes. All you do is flip triplets and the storage/computation cost would scale with the number of non-zero entries.
Actually, the problem is I am a dumbass. In my original code, this loop is indexed wrong
for (int j = 0; j < nnz; j++) {
operands[j] = b.coeff(v[j]); // should be b.coeff(v[pos]);
gradients[j] = w.coeff(pos++);
}
Now, it is matching with the finite differences:
stan::math::vector_v
csr_matrix_times_vector2(const int& m,
const int& n,
const Eigen::VectorXd& w,
const std::vector<int>& v,
const std::vector<int>& u,
const stan::math::vector_v& b,
std::ostream* pstream__) {
Eigen::Map<const Eigen::SparseMatrix<double,Eigen::RowMajor> >
sm(m, n, w.size(), &u[0], &v[0], &w[0]);
Eigen::VectorXd result = sm * value_of(b);
stan::math::vector_v out(m);
int pos = 0;
for (int k = 0; k < m; k++) {
int nnz = u[k + 1] - u[k];
if (nnz > 0) {
std::vector<stan::math::var> operands(nnz);
std::vector<double> gradients(nnz);
for (int j = 0; j < nnz; j++) {
operands[j] = b.coeff(v[pos]);
gradients[j] = w.coeff(pos++);
}
out.coeffRef(k) = precomputed_gradients(result.coeff(k),
operands, gradients);
}
else out.coeffRef(k) = 0.0;
}
return out;
}
I hate to disagree here, but youâre not. What I donât know is how my injunction not to insult people applies to self-insults!
Great news.
Well usually itâs just self-flagellation followed by asking questions that could be solved by 5 minutes with the manual so Benâs case might be unique.
For sparse assembly another option is hash-assembly. For reordering is it possible to use third-party graph partitioner such as metis? This would help MPI solver, especially if we want do domain decomposition later on.
My 2 centsâŚ
Metis is a licensing nightmare, but other third-party partitioners are open source.