Rowsum or columnsum of a matrix

Hi,

I have a naive question. I am working on a hierarchical multinomial model with individual-level coefficients. I have N people, K outcomes per choice set, different people have different number of choice sets and Z attributes.

I find myself needing to calculate this operation: I would element-multiply a N x Z matrix with another N x Z matrix, and then sum the columns to get to a N x 1 vector.

Is there a operator to perform the columnsum? I apologize if it is already in the manual. I could not seem to find it.

Thanks.

There is no such function. You could pre-multiply the thing by a row_vector of ones.

Late follow-up, but it shouldn’t be hard to just write a row sum or column sum function in Stan. Input would be a matrix and output a vector.

Replying bc I played with this last night and took me a bit to work out how to implement Ben’s suggestion. For a matrix m with r rows and c columns:

matrix[r,c] m ;

column-sum is acheived by:

row_vector[r] rv = rep_row_vector(1.0,r) ;
row_vector[c] colsum = rv * m ;

and row-sum is achieved by:

vector[c] v = rep_vector(1.0,c) ;
vector[r] rowsum = m * v ;

However, I’ll note that for all the matrix sizes I’ve explored, simply doing the sums in a loop is faster (and note that if you have the freedom to choose the orientation of your matrix such that you want colSums, that’s fastest thanks to the column-major storage/access for matrices).

1 Like

@mike-lawrence: That’s always going to be much slower because it involves a multiply-add instead of just a sum. It can be made more efficient by not assigning to intermediates and pre-allocating the vectors of 1 values.

transformed data {
  vector[C] v_ones = rep_vector(1, C);
}
...
model {
  // m is R x R matrix
  vector[R] row_sums = m * v_ones;

This avoids reallocating and assigning the vector of ones and also makes sure it remains a constant rather than an autodiff variable. Doing it this way should be close in speed to doing the sums directly and might even dominate for big matrices because the vector product will be better at blocking for locality.

Ah, I’m fairly sure I did it precisely this way when I reported loops being faster.