Dealing with constraints on correlations

I am wondering how Stan handles constraints on correlations. For example, with a 3 by 3 matrix, the correlations cannot be uniform between -1 and 1. Yet, when I fit this size correlation matrix, I do not receive any error unless I include generated quantities which states that a value has been rejected. I take this to mean that a whether the matrix is positive definite is evaluated and then rejected if not before proceeding. And hence, the sampler does not throw an error and stop.

Note this model is for a simple example, which I hope to scale to a 4 by 4 random effects co-variance matrix. Also, while I do know that the LKJ is currently recommended by the Stan team, in my application I need independent priors for each correlation.

model <- '

// Pearson Correlation
data { 
int<lower=0> n;
vector[3] x[n];
}
parameters {
vector[3] mu;
vector<lower=0>[3] sigma;

vector<lower=-1,upper=1>[3] r;
} 
transformed parameters {
cov_matrix[3] S;
// Reparameterization
S[1,1] = square(sigma[1]);
S[1,2] = r[1] * sigma[1] * sigma[2];
S[2,1] = r[1] * sigma[1] * sigma[2];

S[1,3] = r[2] * sigma[1] * sigma[3];
S[3,1] = r[2] * sigma[3] * sigma[1];

S[2,3] = r[3] * sigma[2] * sigma[3];
S[3,2] = r[3] * sigma[3] * sigma[2];

S[2,2] = square(sigma[2]);
S[3,3] = square(sigma[3]);
}
model {
// Priors
mu ~ normal(0, 100);
sigma[1] ~ cauchy(0,3);
sigma[2] ~ cauchy(0,3);
sigma[3] ~ cauchy(0,3);
r[1] ~ uniform(-1, 1);
r[2] ~ uniform(-1, 1);
r[3] ~ uniform(-1, 1);

// Data
x ~ multi_normal(mu, S);

}
generated quantities{
vector[3] ppc;
ppc = multi_normal_rng(mu, S);

}'

Here is the R code:

x <- scale(mtcars[,1:3])

mod <- stan_model(model_code = model, verbose = TRUE)

fit <- sampling(mod,   data=list(x = x, n = nrow(x)))

1 Like

Hi,
I would expect that cov_matrix in transformed parameters results in a check of symmetricity and pos. definitiness at the end of the block, but maybe it does not check positive definiteness there, @Bob_Carpenter likely has better understanding of what exactly is tested when.

Generally the way to go is to construct your transformed parameters so that they always satisfy the constraints you put on them. If you need to compute a covariance matrix, make sure that for any value of sigma, S is symmetric and positive definite (one construction to do this is shown in the manual at https://mc-stan.org/docs/2_21/reference-manual/covariance-matrices-1.html)

This is IMHO an impossible goal. Take for example a covariance matrix of the form:

\begin{pmatrix} 2 & 1 & x \\ 1 & 2 & 1 \\ x & 1 & 2 \end{pmatrix}

This is a valid covariance matrix only for -1 < x < 2 (just checked numerically, I am not that good at math :-) ), so once some correlations are known, others are constrained, i.e. correlations are not independent.

Hope that helps!

1 Like

You can try to build the Cholesky decomposed correlation (and covariance) matrix directly. Check out the Wikipedia page to see what goes where. This way you can put priors on the “primitive” parameters like individual rhos and sigmas. In the case above it would be something like this:

data {
  
  int<lower=0> n;
  vector[3] x[n];
  
}
parameters {
  
  vector[3] mu;
  vector<lower=0>[3] sigma;
  real<lower=0,upper=1> rho_raw[3];
  
}
transformed parameters{
  
  real<lower=-1,upper=1> rho[3];
  
  matrix[3,3] L_Omega = rep_matrix(0.0, 3, 3);
  matrix[3,3] L_Sigma;
  matrix[3,3] Omega;
  matrix[3,3] Sigma;
  
  for(i in 1:3)
    rho[i] = 2*(rho_raw[i] - 0.5);

  // this is very explicit -- you can probably write a concise function for this...
  L_Omega[1,1] = 1;
  L_Omega[2,1] = rho[1];
  L_Omega[3,1] = rho[2];
  
  L_Omega[2,2] = sqrt(1 - square(rho[1]));
  L_Omega[3,2] = (rho[3] - rho[1]*rho[2]) / L_Omega[2,2];
  L_Omega[3,3] = sqrt(1 - square(L_Omega[3,1]) - square(L_Omega[3,2]));
  
  L_Sigma = diag_pre_multiply(sigma, L_Omega);
  
  Omega = multiply_lower_tri_self_transpose(L_Omega);
  Sigma = quad_form_diag(Omega, sigma);
  
}
model {
  
  x ~ multi_normal_cholesky(mu, L_Sigma);
  
  rho_raw[1] ~ beta(3,2);
  rho_raw[2] ~ beta(5,7);
  rho_raw[3] ~ beta(3,4);
  
  sigma[1] ~ cauchy(0,3);
  sigma[2] ~ cauchy(0,3);
  sigma[3] ~ cauchy(0,3);
  
  mu ~ normal(0, 100);
    
}
generated quantities{
  
  vector[3] ppc = multi_normal_rng(mu, Sigma);
  
}

To find a suitable prior for rho_raw I usually do something like this:

N <- 1e5
hist(2 * (rbeta(N, 3, 4) * 0.5)) # switch out 3, 4...

Cheers!

1 Like