Summing values within group

Hi,

I am trying to write a function that does simple aggregation by group.

Suppose I have two variables, groupid and values:

groupid values
1 3
1 2
1 4
2 2
2 2
2 3

The output I want is 2x1 vector,

9
7

Is there any easy way to go about this?

Thank you!

Are you trying to do this in Stan? It’s easy to work on rows or columns by slicing in Stan, but I’m not sure what 9 and 7 are supposed to represent. In general, if you have two variables:

array[N] int x;
array[N] int y;

vector[2] = [foo(x), bar(y)]';

Where foo and bar are the functions to apply to x and y.

Hi, I am not the OP and apologize for bumping an old post.

But the 9 and 7 in OP’s example is that 9 = 3+2+4, which corresponds to groupid == 1. Similarly, 7 = 2+2+3, corresponding to groupid == 2.

What OP was asking (and I am interesting in the same thing now) is equivalent to R’s rowsum( x ,group = ) function. Please help if possible.

 vector sum_by(vector x, array[] int by_index) {
         int J = num_elements(by_index);
         int N = num_elements(x);
         vector[J] x_by;
 
         for (n in 1:N)
             x_by[by_index[n]] += x[n];
 
         return x_by;
   }

Thank you very much for this. I tested it and I discovered 2 issues, which I then fixed. Please review at your convenience

   vector sum_by(vector x, array[] int by_index) {
         int J = max(by_index);
         int N = num_elements(x);
         vector[J] x_by=rep_vector(0,J);
 
         for (n in 1:N)
             x_by[by_index[n]] += x[n];
 
         return x_by;
   }
2 Likes

That looks good. There’s a convenience function (which may be slightly faster) to make the zero vector.

  vector sum_by(vector x, array[] int by_index) {
         int J = max(by_index);
         int N = num_elements(x);
         vector[J] x_by = zeros_vector(J);
 
         for (n in 1:N)
             x_by[by_index[n]] += x[n];
 
         return x_by;
   }

If you have other functions that you want to apply by group I suggest sorting the x vector to put group 1 in the first 1:g1 spots and group 2 in g1 + 1 : g2 spots. Then you can do this where you pass the sorted x vector and the size of each group. You can you then update the sum() function with other functions you want to apply by.

  vector sum_by(vector x, array[] int num_in_grp) {
         int J = num_elements(num_in_grp);
         vector[J] sum_by;
         int start = 1;
         int end;
         
         for (j in 1:J) {
           sum_by[j] = sum(segment(x, start, num_in_grp[j]));
           start += num_in_grp[j];
         }
 
         return sum_by;
   }
2 Likes