# Is there a more efficient way to code this robust correlation approach

I am trying to fit a robust correlation model to estimate correlations between a large number of variables.

Is there a more efficient way to code this rather than manually specifying each element in the covariance matrix?

This is what I have for 4 (out of 147) variables:

``````data {
int<lower=1> N;  // number of observations
int<lower=1> K;  // number of variables
vector[K] x[N];  // input data: rows are observations (N), columns are the variables (K)
}

parameters {
vector[K] mu;                 // locations of the marginal t distributions
vector<lower=0>[K] sigma;       // scales of the marginal t distributions
real<lower=1> nu;             // degrees of freedom of the marginal t distributions
vector<lower=-1, upper=1>[(K-1)*K/2] rho;  // correlation coefficient
}

transformed parameters {
// Covariance matrix
cov_matrix[K] cov = [[sigma ^ 2, sigma * sigma * rho, sigma * sigma * rho, sigma * sigma * rho],
[sigma * sigma * rho, sigma ^ 2, sigma * sigma * rho, sigma * sigma * rho],
[sigma * sigma * rho, sigma * sigma * rho, sigma ^ 2, sigma * sigma * rho],
[sigma * sigma * rho, sigma * sigma * rho, sigma * sigma * rho, sigma ^ 2]];
}

model {
// Likelihood
// Bivariate Student's t-distribution instead of normal for robustness
x ~ multi_student_t(nu, mu, cov);

// Noninformative priors on all parameters
sigma ~ normal(0,2);
mu ~ normal(0, 2);
nu ~ gamma(2, 0.1);
}

generated quantities {
// Random samples from the estimated bivariate t-distribution (for assessment of fit)
vector[K] x_rand;
x_rand = multi_student_t_rng(nu, mu, cov);
}
``````

Stan has a correlation matrix type. If S is a diagonal matrix of your scales and \Omega is your correlation amtrix then your covariance matrix can be written using matrix multiples as \Sigma = S \Omega S . There’s a special function in Stan for that operation called `quad_form_diag` which is optimized. So you could replace your parameters and transformed parameters block with the following

``````parameters {
vector[K] mu;                 // locations of the marginal t distributions
vector<lower=0>[K] sigma;       // scales of the marginal t distributions
real<lower=1> nu;             // degrees of freedom of the marginal t distributions
corr_matrix[K]  Omega;
}

transformed parameters {
// Covariance matrix
}
``````

Thank you!

Is `Omega` in your parameterization equivalence to `rho` in my mine? The correlation parameters are what I’m most interested in estimating.

Yes, `Omega` is equivalent. For example, `Omega[1,2]` will be the correlation between the first and second dimensions.

Thanks again!

This is my revised code:

``````data {
int<lower=1> N;  // number of observations
int<lower=1> K;  // number of variables
vector[K] x[N];  // input data: rows are observations (N), columns are the variables (K)
}

parameters {
vector[K] mu;                 // locations of the marginal t distributions
vector<lower=0>[K] sigma;       // scales of the marginal t distributions
real<lower=1> nu;             // degrees of freedom of the marginal t distributions
corr_matrix[K]  Omega;
}

transformed parameters {
// Covariance matrix
}
model {
// Likelihood
// Bivariate Student's t-distribution instead of normal for robustness
x ~ multi_student_t(nu, mu, cov);

// Noninformative priors on all parameters
sigma ~ normal(0,2);
mu ~ normal(0, 2);
nu ~ gamma(2, 0.1);
}

generated quantities {
// Random samples from the estimated bivariate t-distribution (for assessment of fit)
vector[K] x_rand;
x_rand = multi_student_t_rng(nu, mu, cov);
}

``````

It compiles fine, but I’m unable to sample:

``````Chain 3: Rejecting initial value:
Chain 3:   Error evaluating the log probability at the initial value.
Chain 3: Exception: validate transformed params: cov is not positive definite.  (in 'modelf832e4779fc7d_Robust_Corr' at line 16)

``````

Is this a specification issue or might some initial values help?

I’m not sure why it wouldn’t be positive definite. You can try printing out the `Omega` and the `cov` with the `print()` function in Stan. Some initial values or better priors might help if the sampler is starting off in a bad part of parameter space. Stan has a prior for correlation matrices you can use described here.

I wonder if it could also be an issue of it not being positive definite up to numerical precision. In that case I’ve found that working with the Cholesky factors of the correlation and covariance matrices is often more numerical stable. In that case you would declare `cholesky_factor_corr[K] L;` and do something like `cov_matrix[K] cov = multiply_lower_tri_self_transpose(diag_pre_multiply(sigma, L))` which is SL(SL)^T = SLL^TS = S \Omega S.

I think the error was due to the data. I dropped two of the variables and everything is running smoothly.

Thank you!

Glad to hear. Was it faster than how you coded it up originally?

It was the writing that was taking ages! I was initially manually typing every element in the 49*49 covariance matrix :S