Ensure correlation matrix is positive semi-definite?

Hello,

I am trying to generate a variance-covariance matrix using the decomposition from Barnard et al 2000, where the variance-covariance matrix can be written as a product of a diagonal matrix and a correlation matrix. I want to draw the off-diagonal values of the correlation matrix from a (shifted and scaled) Beta distribution where the shape parameters are not necessarily equal to each other.

The problem I am running into is ensuring that the correlation matrix is positive definite. I’m wondering if there is a way to implement this restriction in Stan. I keep getting this error during the sampling process:

"Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Exception: validate transformed params: S[i_0__] is not positive definite.“

Here is what my code currently looks like (I omitted some of the lines that aren’t directly related to this problem):

// construct a correlation matrix 
functions {
  
  matrix to_corr(vector x, int K) {
    matrix[K, K] y;
    int pos = 1;
    for (i in 1: K) {
      for (j in 1:(i - 1)) {
        y[i, j] = x[pos];
        y[j, i] = x[pos];
        pos += 1;
      }
      y[i, i] = 1;
    }
    //print(y);
    return y;
  }
}


data { 
  int<lower=1> Q; //  dimension of the v-cov matrices
  int<lower=1> I; //  no. of subjects
} 

transformed data {
  int<lower=0> off_diag_Q = (Q*(Q-1))/2;
}


parameters { 
  vector[Q] log_Q_Si[I]; 
  vector <lower=0, upper=1>[off_diag_Q] r_raw[I]; //values to draw from Beta distribution
  real<lower=0> hyper_alpha; // hyperparameter on the Beta prior
  real<lower=0> hyper_beta; //hyperparameter on the Beta prior
}

transformed parameters{
   cov_matrix[Q] S[I]; // subject specific vcov’s
   vector<lower=0>[Q] D_Si[I]; //non-negative diagonals of the Si decomposition
   matrix[Q,Q] R[I]; // corr matrix of Si decomposition
   
   
  for(i in 1:I){
    R[i] = to_corr(2 * r_raw[i] - 1, Q)} //shift & scale to get valid correlations between -1 and 1 
    D_Si[i] = exp(log_Q_Si[i]); 
    S[i] = quad_form_diag(R[i], D_Si[i]); //get back the S_i's 
  }

}


model {
  //hyperparameters on the S_is 
  hyper_alpha ~ exponential(0.1);
  hyper_beta ~ exponential(0.1);

for(i in 1:I) {
    r_raw[i] ~ beta(hyper_alpha, hyper_beta);
  }
}

(If it helps, I only need this for the 3x3 matrix case.)

Thank you!

The easiest thing to do would be to:

  1. Declare R to be a cor_matrix
  2. Use an LKJ prior on R (or even better to use the Cholesky decomposition of it, as described on the next page in the manual)

This will allow you to model the variance and the correlation separately.

1 Like

Hey, thanks for the answer! I’m actually trying to avoid using an LKJ prior for this particular correlation matrix. (It’s not flexible enough for what I’m trying to do with this model.) Right now I’m thinking to solve for the determinant of the correlation matrix in terms of the correlation parameters and then using that to get the constraints on each correlation parameter. (I was wondering if there’s an alternative way to get the positive-definite constraint on the matrix without doing all of this though. 😅)

Paging @mike-lawrence

Ha, I was watching this hoping that I could use the resulting solutions myself 😛

@plantbayes maybe take a look at the “pairwise” approach here, and also maybe here.

2 Likes

You can find a near PSD correlation matrix to the given priors. PSD matrices are really rare compared to all square matrices, especially so as the dimension increases.

You need to somehow get PSD by construction. Stan’s corr_matrix declaration in the parameter block will construct a valid PSD matrix. By using this I can put your priors on the lower elements of this matrix to find a corr matrix near the given priors. In the 3 x 3 case you mention, it’s likely that the priors will be close to the corr matrix entries. However, there’s still a large region of values that are not permissible. Visualized as a floating samosa in 3d:
image
(from my blog post Correlation Cats Part I – The Stan Blog )

Meaning, that there are some priors which will be non-permissible for PSD and this non-permissibility increases as the dimensions of the correlation matrix increase. You can mitigate this somewhat by choosing a sigma on the soft normal constraint to be larger (ie set sigma > 0.01).

functions {
  vector lower_elements(matrix M, int tri_size){
    int n = rows(M);
    int counter = 1;
    vector[tri_size] out;
    for (i in 2:n){
      for (j in 1:(i - 1)) {
        out[counter] = M[i, j];
        counter += 1;
      }
    }
    return out;
  }
}
data { 
  int<lower=1> Q; //  dimension of the v-cov matrices
  int<lower=1> I; //  no. of subjects
} 
transformed data {
  int<lower=0> off_diag_Q = (Q * (Q - 1))/2;
}
parameters { 
  vector<lower=0, upper=1>[off_diag_Q] r_raw[I]; //values to draw from Beta distribution
  real<lower=0> hyper_alpha; // hyperparameter on the Beta prior
  real<lower=0> hyper_beta; //hyperparameter on the Beta prior
  corr_matrix[Q] R[I];
}
model {
  //hyperparameters on the S_is 
  hyper_alpha ~ exponential(1);
  hyper_beta ~ exponential(1);

  for(i in 1:I) {
    r_raw[i] ~ beta(hyper_alpha, hyper_beta);
    lower_elements(R[i], off_diag_Q) ~ normal(2 * r_raw[i] - 1, 0.01);
  }
}
8 Likes

Thank you so much! 😃

Wow, thank you so much! I’ll definitely try this approach and see how it goes. 😊

Hi! Just wanted to update this thread in case it would be helpful for anyone else - I ended up using a similar implementation to this paper to generate my 3x3 correlation matrices: https://doi.org/10.1214/20-BA1237

It seems to run reasonably fast when implemented (although I haven’t tried it for a higher dimension than the 3x3 case). Thanks again to everyone here for your kind suggestions!

1 Like

Mind posting your code? I’d love to check if it solves the sparse-matrix scenario I’ve been banging my head on for months!

Of course! Here’s the code I used for the correlation matrix part - apologies in advance that it is probably not the most efficient way to write it - I plan to come back and make it more elegant when I have a bit of extra time ☺️ :

parameters { 
  vector[off_diag_Q] hyper_alpha;
  vector[off_diag_Q] hyper_beta;
  vector<lower=0,upper=1>[I] r_theta12;
  vector<lower=0,upper=1>[I] r_theta13;
  vector<lower=0,upper=1>[I] r_theta23;
} 

transformed parameters {
  cov_matrix[Q] S[I]; //subject specific vcov 
  matrix[Q,Q] R_Si[I]; // corr matrix for Si
  vector<lower=-1, upper=1>[off_diag_Q] r_values[I];
  vector[I] theta_12;
  vector[I] theta_13;
  vector[I] theta_23;

 for(i in 1:I){
   theta_12[i] =acos(2*(r_theta12[i])-1);
   theta_13[i] =acos(2*(r_theta13[i])-1);
   theta_23[i] = acos(2*(r_theta23[i])-1);
   
    r_values[i][1] = cos(theta_12[i]);
    r_values[i][2] = cos(theta_13[i]);
    r_values[i][3] = (sin(theta_12[i])*sin(theta_13[i])*cos(theta_23[i]))+(cos(theta_12[i])*cos(theta_13[i]));
    R_Si[i] = to_corr(r_values[i], Q);
    S[i] = quad_form_diag(R_Si[i], Q_Si[i]); //get back the S_i's after decomposing
  }
}

model {  
 for(i in 1:I) {
  r_theta12[i] ~beta(hyper_alpha[1], hyper_beta[1]);
  r_theta13[i] ~beta(hyper_alpha[2], hyper_beta[2]);
  r_theta23[i] ~beta(hyper_alpha[3], hyper_beta[3]);
} 

My version is a little different than the paper, which has the theta as angles varying over [0, pi] while I’ve just directly had them over the range of the Beta distribution (and then scaled and shifted them to be over [-1,1]). I think the important part of the algorithm is that the 3rd correlation (r_values[i][3]) has to be a function of the other two, but the angle coordinates allows you to still have them vary independently before you transform them into the correlation values. It looks like the formulas for the correlation values get more complicated past the 3x3 case but they have some examples for special matrices (banded, AR1, etc.)

1 Like