To_array_1d row-major and to_vector column-major ordering -- why?


#1

Hi everyone,

I recently implemented a simple multi-class logistic regression in Stan, and wrote the following:

data {
  int n; // number of sites
  int k; // number of species
  int p; // number of predictors

  int y[n, k]; // number of observations for each species
  matrix[n, p] X; // design matrix of predictors
}
transformed data {
  int flattened_y[n * k] = to_array_1d(y);
}
parameters {
  matrix[p, k] beta_1; // species response to predictors
}
model {
  matrix[n, k] logit_response;
  logit_response = X * beta_1;

  to_vector(beta_1) ~ normal(0, 1);

  // Big issue here! to_vector does column major, while to_array_1d does row
  // major. Hence we need the transpose here...
  flattened_y ~ bernoulli_logit(to_vector(logit_response'));
}

I had no end of trouble because I hadn’t realised that to_array_1d and to_vector were different, with to_vector converting to column-major and to_array_1d to row-major format, scrambling everything. I managed to fix it with the transpose in the code above, but it took a while to track down! I was wondering whether there is a good reason for this.

Thanks,
Martin


#2

It’s the order the matrix and array are stored in memory so it avoids copying. Unfortunately they don’t match.


#3

Perhaps not good—we went with the default column-major ordering for the Eigen C++ library matrices, which we use for Stan. We have to have row-major ordering of arrays because they’re implemented as arrays of arrays (using std::vector under the hood).