A proposal for sparse matrices and GPs in Stan


#17

Yes.

This was always doable and will be done once I finish this grant application. (Sorry I didn’t make that clear) But this is the minimal use of sparse matrices. We still need to have the language discussion if we’re going to manipulate them in-language, speed up lmer, or make them user friendly.


#18

You could, but a lot of the time spent in csr_matrix_times_vector is on checking validity of v and u, which don’t change. So, rstanarm needs a version without all that. If there were a way to construct a sparse_matrix in data or transformed data, then in each leapfrog step, you could just check that the matrix was conformable before multiplying by a vector.


#19

Is there any particular reason you can’t have a
csr_matrix_times_vector_no_check
function that forces w to be data?


#20

w is actually not checked except for its size being the same as that of v. But it checks that every element of v — which is already restricted to be an int[] is in the allowable range.


#21

That does make at least as much sense as stan/math/rev/mat/fun/multiply.hpp does. But in this case should I be using precomputed_gradients()?


#22

I was thinking that if w is data then the sparse matrix is static and it makes sense to only check it once in the transformed data block.

It’s probably worth thinking of cheaper ways to check that the sparsity on the left and right-hand sides of expressions like A_plus_B = A + B; match without having to do those expensive pre-checks.

I mean you can. But the sparse matrix construction doesn’t seem that expensive, so doing it twice isn’t a killer.


#23

It seems simpler to do it with precomputed_gradients but apparently I am doing it wrong

stan::math::vector_v
csr_matrix_times_vector2(const int& m,
                         const int& n,
                         const Eigen::VectorXd& w,
                         const std::vector<int>& v,
                         const std::vector<int>& u,
                         const stan::math::vector_v& b,
                         std::ostream* pstream__) {
  Eigen::Map<const Eigen::SparseMatrix<double,Eigen::RowMajor> >
    sm(m, n, w.size(), &u[0], &v[0], &w[0]);
  Eigen::VectorXd result = sm * value_of(b);
  stan::math::vector_v out(m);
  int pos = 0;
  for (int k = 0; k < m; k++) {
    int nnz = u[k + 1] - u[k];
    if (nnz > 0) {
      std::vector<stan::math::var> operands(nnz);
      std::vector<double> gradients(nnz);
      for (int j = 0; j < nnz; j++) {
        operands[j] = b.coeff(v[j]);
        gradients[j] = w.coeff(pos++);
      }
      out.coeffRef(k) = precomputed_gradients(result.coeff(k), 
                                              operands, gradients);
    }
    else out.coeffRef(k) = 0.0;
  }
  return out;
}

It seems that it is not chaining things together correctly because the gradients of the affected parameters come out too small:

 param idx           value           model     finite diff           error
         0         1.60449         14.7638         14.7638     7.70049e-07
         1         1.87157         -555.97         -555.97    -4.90843e-07
         2         1.04997        -1.04997        0.934164        -1.98413
...   All parameters between 2 and 48 inclusive have the wrong derivatives
        48         1.01874         722.119         663.832         58.2875

#24

I don’t think that’s going to work. The precomputed gradient needs to know how to do the matrix-vector product, so it needs u and v.


#25

Maybe the Doxygen for precomputed_gradients is not correct?

    /**
     * This function returns a var for an expression that has the
     * specified value, vector of operands, and vector of partial
     * derivatives of value with respect to the operands.
     *
     * @param[in] value The value of the resulting dependent variable.
     * @param[in] operands operands.
     * @param[in] gradients vector of partial derivatives of result with
     * respect to operands.
     * @return An auto-diff variable that uses the precomputed
     *   gradients provided.
     */
    inline var precomputed_gradients(double value,
                              const std::vector<var>& operands,
                              const std::vector<double>& gradients) {
      return var(new precomputed_gradients_vari(value, operands, gradients));
    }

For the k-th element of the LHS vector, the partial derivatives are the non-zero elements of the matrix in the k-th row and the operands are the corresponding elements of b, right?


#26

From the autodiff paper, the key step in the precomputed_gradients implementation is varis_[i]->adj_ += adj_ * gradients_[i];

For this you need varis_[i]->adj_ += A.transpose() * adj_ to be implemented, which this won’t do.

My code upthread should do, but I havenn’t had a chance to actually test it out.


#27

I think I understand your code, but the A.transpose() step is going to be on the expensive side.

But more generally why does the precomputed_gradients function exist if not to handle cases like this? The wiki page Bob wrote


recommends this way.


#28

Is there an in place transpose?


#29

Not for sparse matrices.


#30

I think the problem is that the gradient isn’t w it’s the sparse matrix (w,u,v), so you’re not passing it the right information.

It’s possible that there’s a clever way to use precomputed gradients when the gradient is neither a dense matrix or a vector, but i’m not sure how.


#31

But starting from triplets you could just compute the transpose once and store both sets of indexes. All you do is flip triplets and the storage/computation cost would scale with the number of non-zero entries.


#32

Actually, the problem is I am a dumbass. In my original code, this loop is indexed wrong

   for (int j = 0; j < nnz; j++) {
        operands[j] = b.coeff(v[j]);  // should be b.coeff(v[pos]);
        gradients[j] = w.coeff(pos++);
      }

Now, it is matching with the finite differences:

stan::math::vector_v
csr_matrix_times_vector2(const int& m,
                         const int& n,
                         const Eigen::VectorXd& w,
                         const std::vector<int>& v,
                         const std::vector<int>& u,
                         const stan::math::vector_v& b,
                         std::ostream* pstream__) {
  Eigen::Map<const Eigen::SparseMatrix<double,Eigen::RowMajor> >
    sm(m, n, w.size(), &u[0], &v[0], &w[0]);
  Eigen::VectorXd result = sm * value_of(b);
  stan::math::vector_v out(m);
  int pos = 0;
  for (int k = 0; k < m; k++) {
    int nnz = u[k + 1] - u[k];
    if (nnz > 0) {
      std::vector<stan::math::var> operands(nnz);
      std::vector<double> gradients(nnz);
      for (int j = 0; j < nnz; j++) {
        operands[j] = b.coeff(v[pos]);
        gradients[j] = w.coeff(pos++);
      }
      out.coeffRef(k) = precomputed_gradients(result.coeff(k),
                                              operands, gradients);
    }
    else out.coeffRef(k) = 0.0;
  }
  return out;
}

#33

I hate to disagree here, but you’re not. What I don’t know is how my injunction not to insult people applies to self-insults!

Great news.


#34

Well usually it’s just self-flagellation followed by asking questions that could be solved by 5 minutes with the manual so Ben’s case might be unique.


#35

For sparse assembly another option is hash-assembly. For reordering is it possible to use third-party graph partitioner such as metis? This would help MPI solver, especially if we want do domain decomposition later on.

My 2 cents…


#36

Metis is a licensing nightmare, but other third-party partitioners are open source.