I’d like to use Stan’s sparse matrix functions, such as csr_extract_w
, but I need to known the number of non-zero values in order to declare this vector inside a Stan script.
Could you elaborate a little bit on your use case? I’m pretty sure (but not completely positive) that because Stan does not support discrete parameters, the only way for a matrix element to be a literal zero is if either
- the matrix is data or transformed data
- the matrix is of some constrained data type where certain elements are constrained to zero a priori.
In either case, the number of zeros should be a priori known.
I have a sparse (random effects) matrix mostly filled with 0’s, by construction that I will multiply by a vector of regression coefficients under a sparse configuration. For reasons of how the associated regression coefficients are indexed, I am not able to render the sparse (w,v,u) representation outside of Stan (in R). I must do so within Stan and it has explicit functions for this purpose; e.g. csr_extract_w()
. The problem is that to declare a Stan vector to hold the output of csr_extract_w(A)
where A
is some sparse matrix, I need to know the number of non-zero elements in A
(to dimension the length of the output vector). Now, I can’t use R to get the number of non-zero elements in A
because I will be embedding the sparse matrix multiplication within a reduce_sum()
function.
Got it. Yeah, that seems like a tough indexing problem to solve. I imagine there is a way to handle all the indices in a super clever way, but also that it might not be worth anyone’s time to figure it out (note that it would be a potential efficiency gain, however, to slice the sparse representation rather than the full matrix in reduce_sum
, because it would ensure that each chunk gets a similar number of nonzero elements).
Anyway, here is a function for the number of nonzero elements. Whether to double-loop over the matrix elements or (as I did) to flatten the matrix and loop once is a matter of taste.
int nnz(matrix x) {
int N = num_elements(x);
vector[N] x2 = to_vector(x);
int out = 0;
for(i in 1:N){
if(x2[i] != 0) {
out += 1;
}
}
return(out);
}
I was stuck in a similar problem. I came up with a different solution that does not require loops.
non_zero = rank(append_row(to_vector(abs(x)), machine_precision()), num_elements(x)+1)
This is a super cool approach.
It’s also a matter of memory pressure. Your approach will be suboptimal because it’s allocating a new N
-vector and filling it, which will take about as long as the rest of the program, which also goes over a loop just computing some arithmetic. I’d suggest:
int nnz(matrix x) {
int nz = 0;
for (n in 1:cols(x)) {
for (m in 1:rows(x)) {
nz += x[m, n] != 0.0;
}
}
return nz;
}
It’d probably be faster with nz += !x[m, n];
to avoid the comparison and then return rows(x) * cols(x) - nz;
.
For large arrays, what about:
int nz = to_int(sum(fmin(machine_precision(),abs(x)) ./ machine_precision()));
I tried to figure out a construction that avoided to_vector()
and for-looping. It incurs a cost per infix operation but uses only operations natively vectorized in Stan.
I try to stay away from things that are this clever.
The way you coded it in Stan will have a problem because Stan’s not clever enough to use expression templates for all the intermediates, so there is going to be a lot of implicit allocation and copying overhead. Both fmin()
and abs()
will allocate new vectors the size of x
.
That ./
can just be /
as the value on the LHS is a scalar.
Because Stan code gets compiled to C++, there’s no difference between writing a loop in Stan and writing in natively in C++. The only difference is when there is autodiff and we can reduce the total number of operations or the size of the autodiff graph.
The code I wrote doesn’t do any allocation or any autodiff, so it should be as fast as if you had written the code directly in C++. All our vectorized code does is code the loops in C++ (sometimes delegating to Eigen, which can vectorize at the CPU level).