Summing values within group

GiKim · February 27, 2023, 4:09pm

Hi,

I am trying to write a function that does simple aggregation by group.

Suppose I have two variables, groupid and values:

groupid values
1 3
1 2
1 4
2 2
2 2
2 3

The output I want is 2x1 vector,

9
7

Is there any easy way to go about this?

Thank you!

Bob_Carpenter · March 17, 2023, 9:08pm

Are you trying to do this in Stan? It’s easy to work on rows or columns by slicing in Stan, but I’m not sure what 9 and 7 are supposed to represent. In general, if you have two variables:

array[N] int x;
array[N] int y;

vector[2] = [foo(x), bar(y)]';

Where foo and bar are the functions to apply to x and y.

sonicking · August 29, 2024, 5:51pm

Hi, I am not the OP and apologize for bumping an old post.

But the 9 and 7 in OP’s example is that 9 = 3+2+4, which corresponds to groupid == 1. Similarly, 7 = 2+2+3, corresponding to groupid == 2.

What OP was asking (and I am interesting in the same thing now) is equivalent to R’s rowsum( x ,group = ) function. Please help if possible.

spinkney · August 29, 2024, 8:49pm

 vector sum_by(vector x, array[] int by_index) {
         int J = num_elements(by_index);
         int N = num_elements(x);
         vector[J] x_by;
 
         for (n in 1:N)
             x_by[by_index[n]] += x[n];
 
         return x_by;
   }

sonicking · August 30, 2024, 1:32am

Thank you very much for this. I tested it and I discovered 2 issues, which I then fixed. Please review at your convenience

   vector sum_by(vector x, array[] int by_index) {
         int J = max(by_index);
         int N = num_elements(x);
         vector[J] x_by=rep_vector(0,J);
 
         for (n in 1:N)
             x_by[by_index[n]] += x[n];
 
         return x_by;
   }

spinkney · August 30, 2024, 8:39pm

That looks good. There’s a convenience function (which may be slightly faster) to make the zero vector.

  vector sum_by(vector x, array[] int by_index) {
         int J = max(by_index);
         int N = num_elements(x);
         vector[J] x_by = zeros_vector(J);
 
         for (n in 1:N)
             x_by[by_index[n]] += x[n];
 
         return x_by;
   }

If you have other functions that you want to apply by group I suggest sorting the x vector to put group 1 in the first 1:g1 spots and group 2 in g1 + 1 : g2 spots. Then you can do this where you pass the sorted x vector and the size of each group. You can you then update the sum() function with other functions you want to apply by.

  vector sum_by(vector x, array[] int num_in_grp) {
         int J = num_elements(num_in_grp);
         vector[J] sum_by;
         int start = 1;
         int end;
         
         for (j in 1:J) {
           sum_by[j] = sum(segment(x, start, num_in_grp[j]));
           start += num_in_grp[j];
         }
 
         return sum_by;
   }

Topic		Replies	Views
Function `by` Modeling	2	511	January 24, 2022
Writing a user defined column sums function in Stan General techniques	3	1107	May 18, 2022
Indexing expressions Developers	5	568	July 12, 2022
Passing integer array into Stan function General rstan	3	1017	October 3, 2022
How to use reduce_sum in for loops? Modeling	8	363	January 10, 2024

Summing values within group

Related topics