Estimating Robust Correlations for Multiple (>2) Outcomes

JLC · June 2, 2021, 2:21pm

I am attempting to conduct a robust correlation estimation by adapting This Post to 4 outcome variables (my real data have 9 outcomes).

I’m a little stuck on how to write the likelihood to estimate the additional scales and degrees of freedom parameters.

Some fake data:

sigma = c(10, 15, 10, 20)
rho = c(-0.6, 0.3, 0.6, -0.3, 0.6, -0.3)
cov.mat = matrix(c(sigma[1] ^ 2,
                   sigma[1] * sigma[2] * rho[1],
                   sigma[1] * sigma[3] * rho[2],
                   sigma[1] * sigma[4] * rho[3],
                   sigma[1] * sigma[2] * rho[1], 
                   sigma[2] ^ 2, 
                   sigma[2] * sigma[3] * rho[4],  
                   sigma[2] * sigma[4] * rho[5],
                   sigma[1] * sigma[3] * rho[2], 
                   sigma[2] * sigma[3] * rho[4], 
                   sigma[3] ^ 2, 
                   sigma[3] * sigma[4] * rho[6],
                   sigma[1] * sigma[4] * rho[3], 
                   sigma[2] * sigma[4] * rho[5],
                   sigma[3] * sigma[4] * rho[6],
                   sigma[3] ^ 2),
                 nrow=4, byrow=T)

My current, failing attempt:

robust_code = "
data {
  int<lower=1> N;  // number of rows
  //int<lower=1> K;  // number of columns
  vector[4] x[N];  // input data: rows (N) are observations, columns (4) are the variables
}
parameters {
  vector[4] mu;                 // locations of the marginal t distributions
  real<lower=0> sigma[4];       // scales of the marginal t distributions
  real<lower=1> nu[4];             // degrees of freedom of the marginal t distributions
  real<lower=-1, upper=1> rho[6];  // correlation coefficients
}
transformed parameters {
  // Covariance matrix
  cov_matrix[4] cov = [[sigma[1] ^ 2, sigma[1] * sigma[2] * rho[1], sigma[1] * sigma[3] * rho[2], sigma[1] * sigma[4] * rho[3]],
                       [sigma[1] * sigma[2] * rho[1], sigma[2] ^ 2, sigma[2] * sigma[3] * rho[4],  sigma[2] * sigma[4] * rho[5]],
                       [sigma[1] * sigma[3] * rho[2], sigma[2] * sigma[3] * rho[4], sigma[3] ^ 2, sigma[3] * sigma[4] * rho[6]],
                       [sigma[1] * sigma[4] * rho[3], sigma[2] * sigma[4] * rho[5],sigma[3] * sigma[4] * rho[6],sigma[3] ^ 2]];
}
model {
  // Likelihood
  // Bivariate Student's t-distribution instead of normal for robustness
  x ~ multi_student_t(nu, mu, cov);
    
  // Noninformative priors on all parameters
  sigma ~ normal(0,100);
  mu ~ normal(0, 100);
  nu ~ gamma(2, 0.1);
}

generated quantities {
  // Random samples from the estimated bivariate t-distribution (for assessment of fit)
  vector[2] x_rand;
  x_rand = multi_student_t_rng(nu, mu, cov);
}"

On compile I get an error: Ill-typed arguments to '~' statement. No distribution 'multi_student_t' was found with the correct signature.

Do I need a loop or some vectorization for the likelihood?

mike-lawrence · June 2, 2021, 2:55pm

Does multi_student_t allow a vector for the argument nu?

JLC · June 2, 2021, 3:03pm

It looks like mu can be a vector , nu is a real, and cov (sigma) is a matrix

mike-lawrence · June 2, 2021, 3:10pm

Just checked and no, nu can’t be a vector. When the doc says real it means a single real, not a vector.

JLC · June 2, 2021, 3:19pm

Thank you!

Setting nu to a single real fixed the issue.

Would it make any sense to put a linear formula on nu to to estimate a real for each outcome and then loop over those estimates?

mike-lawrence · June 2, 2021, 3:25pm

That’s a really neat idea I’ve never seen/thought of before. Give it a shot with some fake data and check how it performs.

mike-lawrence · June 2, 2021, 3:30pm

Note also that there’s description here how to hack a different-nu-per-dimension. I wish it was a built in thing though.

JLC · June 2, 2021, 4:08pm

That looks much more reasonable than the ugly hack I was thinking of. I thought maybe I could just tuck a line into transformed parameters (and I hope I don’t need anything Jacobian) or alternately put something into the model block.

mike-lawrence · June 2, 2021, 4:13pm

Oh! When you were talking a loop I thought you were thinking of looping over the N rows with a nu per row thereby making nu a kind of outlier-ness quantity to label each row

JLC · June 2, 2021, 4:21pm

I’m mostly interested in the rho estimates and having those be robust to possible outliers.

Looping over rows makes more sense! I was thinking about it incorrectly.

I’m using this application to try to wrap my head around correlation estimates in raw Stan and to see if they align with the estimates from the full model where I’m predicting the row values.

The longer term goal is to expand this into a polychoric version for some psychometric evaluations of Likert-type data. But I’ve been wrestling with that for ages :D

mike-lawrence · June 2, 2021, 4:55pm

If dealing with likert/ordinal data, be sure to check out this and this.

Topic		Replies	Views
Is there a more efficient way to code this robust correlation approach Modeling rstan , techniques , specification	8	793	October 31, 2021
Constraining covariance matrix to two parameters (Contemporaneous Correlation) Modeling rstan , performance , covariance	0	749	December 9, 2021
Updating Gaussian multiple regression code and building a robust model Modeling rstan	4	425	May 12, 2021
Covariance matrix with constraints Modeling	2	410	January 30, 2024
A matrix of parameters in Stan Integral Modeling fitting-issues	41	1566	November 25, 2019

Estimating Robust Correlations for Multiple (>2) Outcomes

Related topics