Vector braodcasting from matrix by multiple indexing for parameter with two 'layers'

I encountered the following question when writing a model where one parameter has two ‘layers’, in my case being a hierarchical grouping and a categorical explanatory variable. More specifically, the parameter is depending on the treatment (one out of 3) and the hierarchical group (one out of 5) for each data point, hence a matrix of dimension (n,m) with n=\text{number of groups}=5 and m=\text{number of treatments} = 3. Let’s assume there are 10 data points.
How can I efficiently broadcast a vector from a matrix? The example stan program below does what I want but the solution with building a big matrix and taking the diagonal is not so transparent and probably also not so efficient. I thought I could broadcast a vector from a matrix by an index being an array (of length 10) of arrays with the length 2 (the matrix dimension) encoding the group and the treatment for each data point, but I think there is no such thing like an array of arrays.

data {
  int<lower=0> n;
  int<lower=0> m;
  int<lower=0> k;
  matrix[n,m] a;
  int<lower=0> I[k];
  int<lower=0> J[k];
}
model{
}
generated quantities{
  vector[k] z;
  z = diagonal(a[I,J]);
}

library("rstan")
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

a1 <- sample(1:20, 5, replace=FALSE)
a2 <- sample(1:20, 5, replace=FALSE)
a3 <- sample(1:20, 5, replace=FALSE)


data_test <- list(
  n = 5,
  m = 3,
  k = 10,
  a = t(rbind(a1,a2,a3)),
  I = sample(1:5, 10, replace = TRUE),
  J = sample(1:3, 10, replace = TRUE)
)

# fit
test1<-stan(
  file = <the stan program just above>,  # Stan program
  algorithm = "Fixed_param",
  data = data_test,    # named list of data
  chains = 1,             # number of Markov chains
  warmup = 0,          # number of warmup iterations per chain
  iter = 1,            # total number of iterations per chain
)

rstan::extract(test1)

Taking the diagonal means that n == m and so there’s only one index.

For instance,

vector[n] diag;
for(i in 1:n) {
  diag[i] = mat[n, n];
}

gets the diagonal. Also it looks like your matrix is 3x5, so the diagonal isn’t covering the whole matrix if that makes sense.

Usually if you have K data points and N groups and M treatments in a regression, you would have data that looks like:

data {
  int N; // number of groups
  int M; // number of treatments
  int K; // number of measurements
  int<lower=1, upper=N> ns[K]; // treatment of each measurement
  int<lower=1, upper=M> ms[K]; // group of each measurement
}

The matrix I take the diagonal of is square, as \texttt{a[I,J]} broadcasts a matrix of dimensions (\text{length}(I), \text{length}(J)) = (10,10). More precisely, \texttt{a[I,J]} builds a matrix by picking rows of \texttt{a} according to \texttt{I} and entries of these rows according to \texttt{J}. What I want in the end is the vector v = (\texttt{a}[I_k, J_k])_{k=1}^{10}. \texttt{z} = v in my code does this, but not very directly.

Yes, this is the data in my model, too (the stan program I posted was just to demonstrate the behaviour of the matrix broadcasting, not my actual model). \texttt{I} is \texttt{ns} and \texttt{J} is \texttt{ms} in your notation. But how do you then code the coefficients for the treatments (stratified by group) in this regression? This is exactly what \texttt{a} should do in my code.

Oooo, right, I forgot that Stan does that weird matrix broadcasting thing.

I think just write the for loop.

vector[k] z;
for(idx in 1:k) {
  z[idx] = mat[I[idx], J[idx]];
}

OK, yes this is indeed a simpler and better solution for this problem. Thank you! (And I happy that I am not the only one stumbling over this matrix broadcasting thing…) Still, isn’t this a very common problem it would be forth having an efficient function for it?

Yeah it comes up from time to time.

This is probably something we should look at. For loops in Stan are usually pretty fast so usually I don’t worry about using them outside of evaluating lpdfs (they compile directly to c++), but we’re working on some stuff with the autodiff now where we should probably do something special for this (so thanks for reminding me of this).