Repeat rows of matrix according to values held in array

Hi all,
I’m very new to Stan, so apologies for what I suspect is a simple problem - I haven’t managed to find a solution that works for my code on any of the online help that I have checked.

I’m trying to write a function for use in my Stan program that will take two arguments: (1) a matrix X of N rows, p columns, and (2) an array m (of length N) of integer values showing the number of times to repeat each row of X (I know that in Stan vectors hold reals not integers, so I can’t pass m as a vector?). I would like the function to output a matrix Xout that has number of rows equal to sum(m) and p columns - so the ith row of X (denoted X[i,]) will be repeated m[i] times, creating a “long” version of the matrix. I cannot do this outside stan, as X will contain values that will update during model fit.

I have tried a range of things, each of which gives errors (e.g. (1) passing a vector m of reals in, and extracting each element, assigning to a temporary int value (not shown), or (2) passing in m as an array of integers (non-working code below)). I have been using rep_matrix to repeat the rows of the matrix.

The best I have so far is (I think):

  matrix matrixlongver(matrix X, int m[]){
    row_vector[cols(X)] Xtemp = X[1,];
    matrix[m[1],cols(X)] Xout = rep_matrix(Xtemp,m[1]);
    for(i in 2:rows(X)){
      Xtemp = X[i,];
      matrix[m[i],cols(X)] Xtemp2 = rep_matrix(Xtemp,m[i]);
      Xout = append_row(Xout,Xtemp2)
    }
    return Xout;
  }

Which gives me error message:

"PARSER EXPECTED: <argument declaration or close paren ) to end argument declarations>
Error in stanc(file = file, model_code = model_code, model_name = model_name, : "

Would it be possible to have any help / hints / tips / tricks / pointers to already solved problems for the best way of going about this?
Many thanks

Hi,
it looks like the C-like nature of Stan is biting you (which happens very easily :-) and the error messages are not very helpful either. Anyway, there are multiple small issues with your syntax and some problems with your logic.

  • int m[] for parameter declarations you need int[] m (no good reason, but this is the way thing are in C)
  • Stan allows variable declarations only at the very beginning of each block ( { ... } ), so instead of
for(i in 2:rows(X)){
      Xtemp = X[i,];
      matrix[m[i],cols(X)] Xtemp2 = rep_matrix(Xtemp,m[i]);

you need to write:

for(i in 2:rows(X)){
      matrix[m[i],cols(X)] Xtemp2;
      Xtemp = X[i,];
      Xtemp2 = rep_matrix(Xtemp,m[i]);
  • Finally, sizes of matrices and vectors are fixed in Stan, so X = append_row(X, whatever); cannot work as that would change the size of X, you need to preallocate the full size matrix and then assign to it.

Here’s how I would write that function:

matrix matrixlongver(matrix X, int[] m){
    matrix[sum(m),cols(X)] Xout;
    int next_row = 1;
    for(i in 1:rows(X)){
      if(m[i] < 0) {
        reject("m has to be positive")
      }
      Xout[next_row:(next_row + m[i] - 1),] = rep_matrix(X[i,], m[i]);
      next_row = next_row + m[i];
    }
    return Xout;
  }

and a piece of R code showing that it seems to work OK:

stan_code <- '
functions {
matrix matrixlongver(matrix X, int[] m){
    matrix[sum(m),cols(X)] Xout;
    int next_row = 1;
    for(i in 1:rows(X)){
      if(m[i] < 0) {
        reject("m has to be positive")
      }
      Xout[next_row:(next_row + m[i] - 1),] = rep_matrix(X[i,], m[i]);
      next_row = next_row + m[i];
    }
    return Xout;
  }
}'

expose_stan_functions(stan_model(model_code = stan_code))
in_matrix <- matrix(1:5, nrow = 3, ncol = 2)
in_matrix
m <- c(0,2,3)
m
matrixlongver(in_matrix, m)

Note that this will give you the following message:

DIAGNOSTIC(S) FROM PARSER:
Info: left-hand side variable (name=next_row) occurs on right-hand side of assignment, causing inefficient deep copy to avoid aliasing.

But this is safe to ignore here.

Hope that helps!