A proposal for sparse matrices and GPs in Stan

bgoodri · October 17, 2017, 3:25am

That does make at least as much sense as stan/math/rev/mat/fun/multiply.hpp does. But in this case should I be using precomputed_gradients()?

anon75146577 · October 17, 2017, 4:38am

I was thinking that if w is data then the sparse matrix is static and it makes sense to only check it once in the transformed data block.

It’s probably worth thinking of cheaper ways to check that the sparsity on the left and right-hand sides of expressions like A_plus_B = A + B; match without having to do those expensive pre-checks.

I mean you can. But the sparse matrix construction doesn’t seem that expensive, so doing it twice isn’t a killer.

bgoodri · October 17, 2017, 3:27pm

It seems simpler to do it with precomputed_gradients but apparently I am doing it wrong

stan::math::vector_v
csr_matrix_times_vector2(const int& m,
                         const int& n,
                         const Eigen::VectorXd& w,
                         const std::vector<int>& v,
                         const std::vector<int>& u,
                         const stan::math::vector_v& b,
                         std::ostream* pstream__) {
  Eigen::Map<const Eigen::SparseMatrix<double,Eigen::RowMajor> >
    sm(m, n, w.size(), &u[0], &v[0], &w[0]);
  Eigen::VectorXd result = sm * value_of(b);
  stan::math::vector_v out(m);
  int pos = 0;
  for (int k = 0; k < m; k++) {
    int nnz = u[k + 1] - u[k];
    if (nnz > 0) {
      std::vector<stan::math::var> operands(nnz);
      std::vector<double> gradients(nnz);
      for (int j = 0; j < nnz; j++) {
        operands[j] = b.coeff(v[j]);
        gradients[j] = w.coeff(pos++);
      }
      out.coeffRef(k) = precomputed_gradients(result.coeff(k), 
                                              operands, gradients);
    }
    else out.coeffRef(k) = 0.0;
  }
  return out;
}

It seems that it is not chaining things together correctly because the gradients of the affected parameters come out too small:

 param idx           value           model     finite diff           error
         0         1.60449         14.7638         14.7638     7.70049e-07
         1         1.87157         -555.97         -555.97    -4.90843e-07
         2         1.04997        -1.04997        0.934164        -1.98413
...   All parameters between 2 and 48 inclusive have the wrong derivatives
        48         1.01874         722.119         663.832         58.2875

anon75146577 · October 17, 2017, 3:43pm

I don’t think that’s going to work. The precomputed gradient needs to know how to do the matrix-vector product, so it needs u and v.

bgoodri · October 17, 2017, 3:56pm

Maybe the Doxygen for precomputed_gradients is not correct?

    /**
     * This function returns a var for an expression that has the
     * specified value, vector of operands, and vector of partial
     * derivatives of value with respect to the operands.
     *
     * @param[in] value The value of the resulting dependent variable.
     * @param[in] operands operands.
     * @param[in] gradients vector of partial derivatives of result with
     * respect to operands.
     * @return An auto-diff variable that uses the precomputed
     *   gradients provided.
     */
    inline var precomputed_gradients(double value,
                              const std::vector<var>& operands,
                              const std::vector<double>& gradients) {
      return var(new precomputed_gradients_vari(value, operands, gradients));
    }

For the k-th element of the LHS vector, the partial derivatives are the non-zero elements of the matrix in the k-th row and the operands are the corresponding elements of b, right?

anon75146577 · October 17, 2017, 4:00pm

From the autodiff paper, the key step in the precomputed_gradients implementation is varis_[i]->adj_ += adj_ * gradients_[i];

For this you need varis_[i]->adj_ += A.transpose() * adj_ to be implemented, which this won’t do.

My code upthread should do, but I havenn’t had a chance to actually test it out.

bgoodri · October 17, 2017, 4:05pm

I think I understand your code, but the A.transpose() step is going to be on the expensive side.

But more generally why does the precomputed_gradients function exist if not to handle cases like this? The wiki page Bob wrote

recommends this way.

anon75146577 · October 17, 2017, 4:19pm

Is there an in place transpose?

bgoodri · October 17, 2017, 4:20pm

Not for sparse matrices.

anon75146577 · October 17, 2017, 4:30pm

I think the problem is that the gradient isn’t w it’s the sparse matrix (w,u,v), so you’re not passing it the right information.

It’s possible that there’s a clever way to use precomputed gradients when the gradient is neither a dense matrix or a vector, but i’m not sure how.

sakrejda · October 17, 2017, 4:39pm

But starting from triplets you could just compute the transpose once and store both sets of indexes. All you do is flip triplets and the storage/computation cost would scale with the number of non-zero entries.

bgoodri · October 17, 2017, 5:20pm

Actually, the problem is I am a dumbass. In my original code, this loop is indexed wrong

   for (int j = 0; j < nnz; j++) {
        operands[j] = b.coeff(v[j]);  // should be b.coeff(v[pos]);
        gradients[j] = w.coeff(pos++);
      }

Now, it is matching with the finite differences:

stan::math::vector_v
csr_matrix_times_vector2(const int& m,
                         const int& n,
                         const Eigen::VectorXd& w,
                         const std::vector<int>& v,
                         const std::vector<int>& u,
                         const stan::math::vector_v& b,
                         std::ostream* pstream__) {
  Eigen::Map<const Eigen::SparseMatrix<double,Eigen::RowMajor> >
    sm(m, n, w.size(), &u[0], &v[0], &w[0]);
  Eigen::VectorXd result = sm * value_of(b);
  stan::math::vector_v out(m);
  int pos = 0;
  for (int k = 0; k < m; k++) {
    int nnz = u[k + 1] - u[k];
    if (nnz > 0) {
      std::vector<stan::math::var> operands(nnz);
      std::vector<double> gradients(nnz);
      for (int j = 0; j < nnz; j++) {
        operands[j] = b.coeff(v[pos]);
        gradients[j] = w.coeff(pos++);
      }
      out.coeffRef(k) = precomputed_gradients(result.coeff(k),
                                              operands, gradients);
    }
    else out.coeffRef(k) = 0.0;
  }
  return out;
}

Bob_Carpenter · October 22, 2017, 3:12am

I hate to disagree here, but you’re not. What I don’t know is how my injunction not to insult people applies to self-insults!

Great news.

sakrejda · October 22, 2017, 11:33am

Well usually it’s just self-flagellation followed by asking questions that could be solved by 5 minutes with the manual so Ben’s case might be unique.

yizhang · January 18, 2018, 8:28pm

For sparse assembly another option is hash-assembly. For reordering is it possible to use third-party graph partitioner such as metis? This would help MPI solver, especially if we want do domain decomposition later on.

My 2 cents…

anon75146577 · January 18, 2018, 11:01pm

Metis is a licensing nightmare, but other third-party partitioners are open source.

Topic		Replies	Views
Sparse matrices in Stan Developers	12	2548	February 8, 2018
[Case-study preview] Speeding up Stan by reducing redundant computation Publicity performance	8	2117	June 6, 2020
Stan 2.20.0 released! Announcements	3	1518	July 19, 2019
Sparse GPs of different sizes Modeling	3	384	February 13, 2023
CmdStan & Stan 2.36 release candidate General	2	411	December 3, 2024

A proposal for sparse matrices and GPs in Stan

Related topics