Ordered matrix along both rows and columns

It’s possible to build a matrix so that either the columns or the rows are increasing, but is it possible to do both at the same time, even with a workaround? This means that I need both X_{i,j}>X_{i-1,j} and X_{i,j} > X_{i, j-1}.

A possible solution would be to define ordered vectors for the columns with the previous column as a lower bound, but unfortunately I can’t seem to define bounds when declaring an ordered vector.

1 Like

There’s a math issue to add bounds to ordered vectors that includes the Stan code to do it. Add lower and upper bounds to ordered vector · Issue #2959 · stan-dev/math · GitHub

1 Like

That issue gives Stan code only for scalar bounds but an ordered matrix needs the bounds to be vectors. As discussed in the issue, vector bounds are much more difficult to implement.
Fortunately, the really difficult case happens when both upper and lower bounds are vectors–with only a lower bound a smooth constraining transform is possible.

functions {
  matrix ordered_matrix_lp(vector spine, matrix submatrix) {
    int N = rows(submatrix) + 1;
    int M = cols(submatrix) + 1;
    matrix[N,M] output;
    output[1,:] = spine[1:M]';
    for (r in 2:N) {
      output[r,M] = spine[M-1 + r];
      for (c in 1:M-1) {
        int j = M-c;
        real diff = output[r,j+1] - output[r-1,j];
        output[r,j] = output[r-1,j] + diff*submatrix[r-1,j];
        target += log(diff);
      }
    }
    return output;
  }
}
data {
  int<lower=1> N;
  int<lower=1> M;
}
parameters {
  ordered[N+M-1] spine;
  matrix<lower=0,upper=1>[N-1,M-1] submatrix;
}
transformed parameters {
  matrix[N,M] mat = ordered_matrix_lp(spine, submatrix);
}
1 Like

Just curious what type of model/data do you have where this constraint happens?

Thank you very much for taking the time. I haven’t yet had the chance since then to go back to this model. I think the approach I had in mind is one where the bounds are also vectors. Possibly with only lower vectors it is possible to build the matrix I had in mind. For example, the first column has no lower bound, the second column has the first as its lower bound (which implicitly gives an upper bound to the first column, but is not declared explicitly), and so on, and the final column would have a scaler upper bound and avoid the issues.

In terms of application where this constraint happens. I was thinking of this approach as a way to model winning probability in sports during a game. Think of X as a matrix of the probabilities (or some transformation in unconstrained space) of the home team winning. Each row of X represents a time in the game, going from the beginning to the end of the game. Each column of X represents the score, so that in column 1 the home team is leading by 1, in column 2 the home team is leading by 2, and so on.

The idea of the constraint is:

  • Within each score (say 1-0, so within a column and across the rows), the probability of the home team winning should always increase as time goes by, as there is less time for the score to revert.
  • Within each second of the game (e.g. at minute 20, so within a row and across the columns) the probability of the home team winning if they score another goal (and move to the next column) should always increase.

I think these are reasonable assumptions. Hopefully the data already delivers them, but this would help avoid overfitting in rare situations (e.g. a very early large lead), as well as avoiding sample selection issues (the games and scores we observe are not fully independent of the true winning probabilities).

Do you think this is a reasonable setting where this constraint would be helpful?

2 Likes

The transform that @nhuurre wrote has the vector lower bounds and should work for your problem.

If you want the matrix to sum to 1 then you can just use an ordered simplex

functions {
   //Input: vector of numbers constrained to [0, 1]
   vector ordered_simplex_constrain_lp(vector y) {
    int Km1 = rows(y);
    vector[Km1 + 1] x;
    real remaining = 1; // Remaining amount to be distributed
    real base = 0; // The minimum for the next element
    for(i in 1:Km1) {
      if(y[i] <= 0 || y[i] >= 1) {
        reject("All elements of y have to be in [0,1]");
      }
      int K_prime = Km1 + 2 - i; // Number of remaining elements
      //First constrain to [0; 1 / K_prime]
      real x_cons = inv(K_prime) * y[i];
      // Jacobian for the constraint
      target += -log(K_prime);

      x[i] = base + remaining * x_cons;
      //Jacobian
      target += log(remaining);
      base = x[i];
      //We added  remaining * x_cons to each of the K_prime elements yet to be processed
      remaining -= remaining * x_cons * K_prime;
    }
    x[Km1 + 1] = base + remaining;

    return x;
 }
  
}
data {
  int<lower=1> N;
  int<lower=1> M;
}
parameters {
  vector<lower=0, upper=1>[N * M - 1] raw;
 // matrix<lower=0, upper=1>[N-1,M-1] submatrix;
}
transformed parameters {
  vector[N * M] mat_ = ordered_simplex_constrain_lp(raw);
  matrix[N, M] mat;
  
  {
  int counter = 0;
  for (n in 1:N) {
    for (m in 1:M) {
      counter += 1;
      mat[n, m] = mat_[counter];
    }
  }
  }
  
}

Testing as

library(cmdstanr)
mod <- cmdstan_model("order_mat.stan")

N <- 5
M <- 4

fit <- mod$sample(
  data = list(
    N = N,
    M = M
  ),
  parallel_chains = 4
)


matrix(fit$summary("mat")$mean, N, M)

            [,1]        [,2]       [,3]       [,4]
[1,] 0.002512877 0.005142539 0.00789818 0.01087530
[2,] 0.014030525 0.017418351 0.02096776 0.02479713
[3,] 0.028925737 0.033562539 0.03856591 0.04419234
[4,] 0.050488056 0.057605022 0.06584978 0.07563732
[5,] 0.088293001 0.104533399 0.12947168 0.17923256

If you want each probability to vary but not be constrained to sum to 1 you can use an ordered lb and ub vector.

functions {
  matrix ordered_matrix_lp(vector spine, matrix submatrix) {
    int N = rows(submatrix) + 1;
    int M = cols(submatrix) + 1;
    matrix[N,M] output;
    output[1,:] = spine[1:M]';
    for (r in 2:N) {
      output[r,M] = spine[M-1 + r];
      for (c in 1:M-1) {
        int j = M-c;
        real diff = output[r,j+1] - output[r-1,j];
        output[r,j] = output[r-1,j] + diff*submatrix[r-1,j];
        target += log(diff);
      }
    }
    return output;
  }
  
   vector ordered_lb_ub_lp (vector y, real lb, real ub) {
    int N = rows(y);
    vector[N] x;
    
    x[1] = lb + (ub - lb) * inv_logit(y[1]);
    target += log(ub - lb) + log_inv_logit(y[1]) + log1m_inv_logit(y[1]);
    
    for (i in 2:N) {
      x[i] = x[i - 1] + (ub - x[i - 1]) * inv_logit(y[i] - log(N+1-i));
      target += log(ub - x[i - 1]) + log_inv_logit(y[i] - log(N+1-i)) + log1m_inv_logit(y[i] - log(N+1-i));
    }
    
    return x;
    
  }
}
data {
  int<lower=1> N;
  int<lower=1> M;
}
parameters {
  vector[N+M-1] spine_raw;
  matrix<lower=0,upper=1>[N-1,M-1] submatrix;
}
transformed parameters {
  vector[N + M - 1] spine = ordered_lb_ub_lp(spine_raw, 0, 1);
  matrix[N,M] mat = ordered_matrix_lp(spine, submatrix);
}
          [,1]      [,2]      [,3]      [,4]
[1,] 0.0474284 0.1303922 0.2562426 0.4481193
[2,] 0.1255224 0.2670386 0.4348174 0.6357853
[3,] 0.2288692 0.4145840 0.5921260 0.7725490
[4,] 0.3685253 0.5729360 0.7373100 0.8772822
[5,] 0.5557670 0.7469333 0.8688612 0.9533293