Is there a more efficient way to code this robust correlation approach

JLC · October 30, 2021, 12:49am

I am trying to fit a robust correlation model to estimate correlations between a large number of variables.

Is there a more efficient way to code this rather than manually specifying each element in the covariance matrix?

This is what I have for 4 (out of 147) variables:

data {
    int<lower=1> N;  // number of observations
    int<lower=1> K;  // number of variables
    vector[K] x[N];  // input data: rows are observations (N), columns are the variables (K)
}

parameters {
    vector[K] mu;                 // locations of the marginal t distributions
    vector<lower=0>[K] sigma;       // scales of the marginal t distributions
    real<lower=1> nu;             // degrees of freedom of the marginal t distributions
    vector<lower=-1, upper=1>[(K-1)*K/2] rho;  // correlation coefficient
}

transformed parameters {
    // Covariance matrix
    cov_matrix[K] cov = [[sigma[1] ^ 2, sigma[1] * sigma[2] * rho[1], sigma[1] * sigma[3] * rho[2], sigma[1] * sigma[4] * rho[3]],
                         [sigma[1] * sigma[2] * rho[1], sigma[2] ^ 2, sigma[2] * sigma[3] * rho[4], sigma[2] * sigma[4] * rho[5]],
                         [sigma[1] * sigma[3] * rho[2], sigma[2] * sigma[3] * rho[4], sigma[3] ^ 2, sigma[3] * sigma[4] * rho[6]],
                         [sigma[1] * sigma[4] * rho[3], sigma[2] * sigma[4] * rho[5], sigma[3] * sigma[4] * rho[6], sigma[4] ^ 2]];
}

model {
  // Likelihood
  // Bivariate Student's t-distribution instead of normal for robustness
  x ~ multi_student_t(nu, mu, cov);
    
  // Noninformative priors on all parameters
  sigma ~ normal(0,2);
  mu ~ normal(0, 2);
  nu ~ gamma(2, 0.1);
}

generated quantities {
  // Random samples from the estimated bivariate t-distribution (for assessment of fit)
  vector[K] x_rand;
  x_rand = multi_student_t_rng(nu, mu, cov);
}

arya · October 30, 2021, 11:41pm

Stan has a correlation matrix type. If S is a diagonal matrix of your scales and \Omega is your correlation amtrix then your covariance matrix can be written using matrix multiples as \Sigma = S \Omega S . There’s a special function in Stan for that operation called quad_form_diag which is optimized. So you could replace your parameters and transformed parameters block with the following

parameters {
    vector[K] mu;                 // locations of the marginal t distributions
    vector<lower=0>[K] sigma;       // scales of the marginal t distributions
    real<lower=1> nu;             // degrees of freedom of the marginal t distributions
    corr_matrix[K]  Omega;
}

transformed parameters {
    // Covariance matrix
    cov_matrix[K] cov = quad_form_diag(Omega, sigma);
}

JLC · October 30, 2021, 11:58pm

Thank you!

Is Omega in your parameterization equivalence to rho in my mine? The correlation parameters are what I’m most interested in estimating.

arya · October 31, 2021, 12:11am

Yes, Omega is equivalent. For example, Omega[1,2] will be the correlation between the first and second dimensions.

JLC · October 31, 2021, 12:24am

Thanks again!

This is my revised code:

data {
    int<lower=1> N;  // number of observations
    int<lower=1> K;  // number of variables
    vector[K] x[N];  // input data: rows are observations (N), columns are the variables (K)
}

parameters {
    vector[K] mu;                 // locations of the marginal t distributions
    vector<lower=0>[K] sigma;       // scales of the marginal t distributions
    real<lower=1> nu;             // degrees of freedom of the marginal t distributions
    corr_matrix[K]  Omega;
}

transformed parameters {
    // Covariance matrix
    cov_matrix[K] cov = quad_form_diag(Omega, sigma);
}
model {
  // Likelihood
  // Bivariate Student's t-distribution instead of normal for robustness
  x ~ multi_student_t(nu, mu, cov);
    
  // Noninformative priors on all parameters
  sigma ~ normal(0,2);
  mu ~ normal(0, 2);
  nu ~ gamma(2, 0.1);
}

generated quantities {
  // Random samples from the estimated bivariate t-distribution (for assessment of fit)
  vector[K] x_rand;
  x_rand = multi_student_t_rng(nu, mu, cov);
}

It compiles fine, but I’m unable to sample:

Chain 3: Rejecting initial value:
Chain 3:   Error evaluating the log probability at the initial value.
Chain 3: Exception: validate transformed params: cov is not positive definite.  (in 'modelf832e4779fc7d_Robust_Corr' at line 16)

Is this a specification issue or might some initial values help?

arya · October 31, 2021, 12:42am

I’m not sure why it wouldn’t be positive definite. You can try printing out the Omega and the cov with the print() function in Stan. Some initial values or better priors might help if the sampler is starting off in a bad part of parameter space. Stan has a prior for correlation matrices you can use described here.

I wonder if it could also be an issue of it not being positive definite up to numerical precision. In that case I’ve found that working with the Cholesky factors of the correlation and covariance matrices is often more numerical stable. In that case you would declare cholesky_factor_corr[K] L; and do something like cov_matrix[K] cov = multiply_lower_tri_self_transpose(diag_pre_multiply(sigma, L)) which is SL(SL)^T = SLL^TS = S \Omega S.

JLC · October 31, 2021, 4:38pm

I think the error was due to the data. I dropped two of the variables and everything is running smoothly.

Thank you!

arya · October 31, 2021, 9:48pm

Glad to hear. Was it faster than how you coded it up originally?

JLC · October 31, 2021, 9:52pm

It was the writing that was taking ages! I was initially manually typing every element in the 49*49 covariance matrix :S

Topic		Replies	Views
Estimating Robust Correlations for Multiple (>2) Outcomes Modeling rstan , techniques , specification	10	390	June 2, 2021
Computing Variance Covariance Matrix in a Loop in Rstan Modeling rstan	2	733	April 30, 2021
Updating Gaussian multiple regression code and building a robust model Modeling rstan	4	425	May 12, 2021
Constraining covariance matrix to two parameters (Contemporaneous Correlation) Modeling rstan , performance , covariance	0	749	December 9, 2021
Most efficient way to re-use correlation/covariance matrix multiple times Modeling	1	543	December 3, 2021

Is there a more efficient way to code this robust correlation approach

Related topics