Matrix algebra: parameter with two indices


#1

I have a model with a coefficient (theta) that I want to vary by two other categorical variables, group and time. Suppose there are G=20 groups and T=10 time points, then I can declare the theta parameter as

  matrix[T, G] theta ; 

The group and time indices are passed in as data, as integer vectors of length N (number of observations). In the model step, the easy way to index the particular theta element that corresponds to each observation would be with a for loop in the model step, e.g.

real theta_use[N] ;
for(n in 1:N){
   theta_use[n] = theta[time[n], group[n]];
}

It seems this could be done more efficiently with matrix algebra, but I can’t figure out the best way. Any suggestions? I was thinking that I could pass in (from R) two “map” matrices of dummy variables, time_map and group_map, that are N\times T and N\times G respectively (no intercept). I’m just not sure what to do with these in Stan.

FWIW, this is just a small part of a much larger program, but I eliminated the rest of the details as they are not necessary for understanding this issue.

Thanks,
Jon


#2

I think for what you describe, the code you wrote is better. Doing row/element selections with linear algebra operations isn’t gonna be efficient. You’d be multiplying by vectors of mostly zeros.


#3

I did some tests in Stan and it looks like you’re right. In R the opposite is true - the linear algebra is much more efficient than the for loop. But I think that’s because for loops are particularly slow in R.

Oh well.


#4

Yes, loops are slow in R. Stan just compiles straight to C++. There’s a chapter in the manual on efficiency that explains why — the autodiff overhead of the multiply-by-zero in memory and time is high. But looping is fast.