Function to return number of non-zero elements in a matrix

I’d like to use Stan’s sparse matrix functions, such as csr_extract_w, but I need to known the number of non-zero values in order to declare this vector inside a Stan script.

Could you elaborate a little bit on your use case? I’m pretty sure (but not completely positive) that because Stan does not support discrete parameters, the only way for a matrix element to be a literal zero is if either

  • the matrix is data or transformed data
  • the matrix is of some constrained data type where certain elements are constrained to zero a priori.

In either case, the number of zeros should be a priori known.

1 Like

I have a sparse (random effects) matrix mostly filled with 0’s, by construction that I will multiply by a vector of regression coefficients under a sparse configuration. For reasons of how the associated regression coefficients are indexed, I am not able to render the sparse (w,v,u) representation outside of Stan (in R). I must do so within Stan and it has explicit functions for this purpose; e.g. csr_extract_w(). The problem is that to declare a Stan vector to hold the output of csr_extract_w(A) where A is some sparse matrix, I need to know the number of non-zero elements in A (to dimension the length of the output vector). Now, I can’t use R to get the number of non-zero elements in A because I will be embedding the sparse matrix multiplication within a reduce_sum() function.

Got it. Yeah, that seems like a tough indexing problem to solve. I imagine there is a way to handle all the indices in a super clever way, but also that it might not be worth anyone’s time to figure it out (note that it would be a potential efficiency gain, however, to slice the sparse representation rather than the full matrix in reduce_sum, because it would ensure that each chunk gets a similar number of nonzero elements).

Anyway, here is a function for the number of nonzero elements. Whether to double-loop over the matrix elements or (as I did) to flatten the matrix and loop once is a matter of taste.

  int nnz(matrix x) {
    int N = num_elements(x);
    vector[N] x2 = to_vector(x);
    int out = 0;
    for(i in 1:N){
      if(x2[i] != 0) {
        out += 1;
      }
    }
    return(out);
  }

I was stuck in a similar problem. I came up with a different solution that does not require loops.
non_zero = rank(append_row(to_vector(abs(x)), machine_precision()), num_elements(x)+1)

2 Likes

This is a super cool approach.

It’s also a matter of memory pressure. Your approach will be suboptimal because it’s allocating a new N-vector and filling it, which will take about as long as the rest of the program, which also goes over a loop just computing some arithmetic. I’d suggest:

int nnz(matrix x) {
  int nz = 0;
  for (n in 1:cols(x)) {
    for (m in 1:rows(x)) {
      nz += x[m, n] != 0.0;
    }
  }
  return nz;
}

It’d probably be faster with nz += !x[m, n]; to avoid the comparison and then return rows(x) * cols(x) - nz;.

2 Likes

For large arrays, what about:

int nz = to_int(sum(fmin(machine_precision(),abs(x)) ./ machine_precision()));

I tried to figure out a construction that avoided to_vector() and for-looping. It incurs a cost per infix operation but uses only operations natively vectorized in Stan.

I try to stay away from things that are this clever.

The way you coded it in Stan will have a problem because Stan’s not clever enough to use expression templates for all the intermediates, so there is going to be a lot of implicit allocation and copying overhead. Both fmin() and abs() will allocate new vectors the size of x.

That ./ can just be / as the value on the LHS is a scalar.

Because Stan code gets compiled to C++, there’s no difference between writing a loop in Stan and writing in natively in C++. The only difference is when there is autodiff and we can reduce the total number of operations or the size of the autodiff graph.

The code I wrote doesn’t do any allocation or any autodiff, so it should be as fast as if you had written the code directly in C++. All our vectorized code does is code the loops in C++ (sometimes delegating to Eigen, which can vectorize at the CPU level).

1 Like