Correlation coefficients depend on independent variable

Hello!

Is there a way in Stan to allow the correlation coefficients within a multivariate model to vary as a function of my independent variable x?

That is, I have written a model where both \mu and \sigma are some function of x and also include the cholesky decomposition of the correlation structure that implicitly assumes the correlation structure between each response remains unchanged as x varies. I combine this decomposition with \sigma using diag_pre_multiply. I provide the data and parameter blocks below that describe this. Is there a way to let L_Omega vary as a function of x like I have let the mean and standard variation vary as a function of x? In this case, x is chronological age of a cross-sectional growth sample.

Thanks!

data{
  int N; // # of individuals
  int K; // # of responses
  vector[K] y[N]; // vector of growth responses per individual
  real X[N]; //age
}
parameters{
  vector<lower=0>[K] a; // multiplicative constant
  vector<lower=0>[K] r; // scaling parameter
  vector[K] b; // offset
  vector<lower=0>[K] s_scale; //constant noise
  vector<lower=0>[K] kappa; // slope/gradient of linear noise function
  cholesky_factor_corr[K] L_Omega; // cholesky factor the corr matrix for efficiency
}
transformed parameters{
  vector[K] mu[N]; //mean array of size N containing vectors of length K
  vector[K] s[N]; //sd array of size N containing vectors of length K
  matrix[K,K] Sigma[N]; // cov array of size N with K x K cov matrices
  for(i in 1:N){
    for(k in 1:K){
      mu[i,k] = a[k]*X[i]^(r[k])+b[k]; // mean function
      s[i,k] = s_scale[k]*(1 + kappa[k]*X[i]); // sd function
      }
  }
  for(i in 1:N){
    Sigma[i] = diag_pre_multiply(s[i],L_Omega); // combining the cholesky corr matrix with the noise  
      }
}

@jsocolar might advise on the original topic of modelling a covariate’s influence on a correlation matrix. I’ve seen posts here describing hierarchical treatment of correlation matrices, but not influence by a continuous covariate.

@cwolfe can you post your model block too? Also, can you elaborate on the nature of the data? What are the K “responses”? If the intended purpose of the multivariate structure is to capture consistent differences between levels of the categorical variable “individual” and between levels of the categorical variable “response”, then I’d suggest that a item-response-theory framework might be more straightforward.

1 Like

@mike-lawrence Thank you for the response!

I include the full model code below. At the most basic, I am modelling cross-sectional human growth where x is a continuous Real that equates to chronological age (from ~birth to 20 years of age in my data) and Y is a large response vector of length k. In other words, I have measured k continuous growth variables (e.g., size, breadth, etc.) I have parameterized the mean \mu by an offset power law and the noise \sigma that scales linearly demonstrating heteroskedasticity as chronological age increases. In the code below, I assume Y is distributed as a multivariate normal with a mean per response k and decomposed \Sigma to disentangle the noise from the correlation structure. A few notes, I have also fit this model with a Multivariate Student’s T to compare against the MVN model. Further, I am in the process of trying to fit the model when the data is mixed. That is, the response vector k contains both continuous and ordinal responses variables. I am trying to do this using the GHK algorithm. I have been able to fit continuous and ordinal separately, but have yet to get them to work together… I have posted other times through the discourse here and here for further information about the model.

In terms of the covariance structure and/or the correlation structure, my ultimate goal here is less concerned about the parameters associated with the mean and noise. Instead, I am most concerned with the nature of the covariance structure between growth responses and how they change with age (x). As it is currently written, I have let the noise vary as a function of x, but the correlations between growth traits are assumed to remain unchanged - there is just a single correlation term for each response variable regardless of age. My domain expertise suggests \rho like \sigma, may also vary as an individual ages.

I hope all of this makes sense!

data{
  int N; // # of individuals
  int K; // # of responses
  vector[K] y[N]; // vector of growth responses per individual
  real X[N]; //age
}
parameters{
  vector<lower=0>[K] a; // multiplicative constant
  vector<lower=0>[K] r; // scaling parameter
  vector[K] b; // offset
  vector<lower=0>[K] s_scale; //constant noise
  vector<lower=0>[K] kappa; // slope/gradient of linear noise function
  cholesky_factor_corr[K] L_Omega; // cholesky factor the corr matrix for efficiency
}
transformed parameters{
  vector[K] mu[N]; //mean array of size N containing vectors of length K
  vector[K] s[N]; //sd array of size N containing vectors of length K
  matrix[K,K] Sigma[N]; // cov array of size N with K x K cov matrices
  for(i in 1:N){
    for(k in 1:K){
      mu[i,k] = a[k]*X[i]^(r[k])+b[k]; // mean function
      s[i,k] = s_scale[k]*(1 + kappa[k]*X[i]); // sd function
      }
  }
  for(i in 1:N){
    Sigma[i] = diag_pre_multiply(s[i],L_Omega); // combining the cholesky corr matrix with the noise  
      }
}
model{
  //priors
  a ~ normal(0,10); // half-normal
	r ~ normal(0,1); //half-normal
	b ~ normal(0,10);
	kappa ~ normal(0,1); //half-normal
	s_scale ~ cauchy(0,5); //half-cauchy
  L_Omega ~ lkj_corr_cholesky(0.5);
	for(i in 1:N){
	 y[i] ~ multi_normal_cholesky(mu[i], Sigma[i]); //likelihood
	}
}
generated quantities{
  vector[K] Y_mean[N]; //posterior mean
  vector[K] Y_pred[N]; //posterior predictive check
  corr_matrix[K] Omega; //re-factor corr matrix
  cov_matrix[K] Sigma_new[N];//put cov matrix back together
  real<lower=0> a_prior = normal_rng(0,10); // a prior
  real<lower=0> r_prior = normal_rng(0,1); // r prior
  real b_prior = normal_rng(0,10); // b prior
  real<lower=0> s_const_prior = cauchy_rng(0,5); // constant s prior
  real<lower=0> kappa_prior = normal_rng(0,1); // kappa prior
  real s_prior_lin[N] = s_const_prior*(1+kappa_prior*X[N]); //heterskedatsic w/ prior
  vector[K] log_lik[N]; // log-likelihood for model comparison
  Omega = multiply_lower_tri_self_transpose(L_Omega); //corr matrix
  for(i in 1:N){
    Sigma_new[i] = quad_form_diag(Omega, s[i]); //posterior cov matrix
  }
  for(i in 1:N){
    for(k in 1:K){
      Y_mean[i,k] = a[k]*X[i]^r[k] +b[k]; //posterior mean
    }
    Y_pred[i] = multi_normal_rng(Y_mean[i],Sigma_new[i]); //posterior predictive
  }
  real y_sim[N] = normal_rng(a_prior*X^(r_prior) + b_prior,s_prior_lin); // prior pred distribution
  for(n in 1:N){
    log_lik[n] = multi_normal_cholesky_lpdf(y[n]|mu[n],Sigma[n]); // log lik of each obs
  }
}
1 Like

@cwolfe I think the million dollar question here is what functional forms of relationship you want to accommodate between the x and the correlations. It’s a hard question because the allowable functional forms are heavily constrained by the requirement for positive-semi-definite-ness.

One possibility is to bin the data and estimate independent correlation matrices for each bin. You’ll face a tradeoff between how finely resolved the bins are and whether each bin contains enough data to robustly estimate the matrix. If you want to get hack-y and non-generative, a possible refinement might be to shrink the bins but then introduce a penalty term that penalizes some metric of the difference/distance between the correlation matrices that are in adjacent bins, using this penalty to pool information across bins. I’m afraid I don’t have any brilliant ideas about what sort of penalty term you might use though. Perhaps the right choice of penalty yields some nicely interpretable generative model; I don’t know.

1 Like

@jsocolar That is the question! I am not quite sure of the actual functional form, I just know the correlations may change as x increases. I have actually thought about this binning route so I’m glad to see I wasn’t crazy for thinking that way.

Let me ask this: If I bin the data by age group (e.g., 0-1,1-2, etc.), I could then get a matrix per group, yes? Then, I would take each bin and vary the noise by each bin so I have equal length matrices that I can then combine using diag_pre_multiply. Now, is it then possible to do an if else statement embedded within my likelihood for loop that would make sure I combine the “correct” covariance matrix with the “correct” mean? So if I have a 1.5 year, I want their mean and the covariance matrix that corresponds with a 1-2 year old. I suppose I could also bin the means as well, but I’d like to avoid collapsing to much variation if possible.

I didn’t quite follow the description of your proposed code, but here’s how I would approach the binning solution:

  1. Outside of Stan, create a data vector of bins which gives the age group that each observation falls into. Pass this data to Stan as a one-dimensional integer array int bin[N];
  2. Declare an array L_Omega_array of correlation matrices, one per bin.
  3. When you compute Sigma[i], replace L_Omega with L_Omega_array[bin[i]].
  4. (Minor efficiency gain) move the computation of Sigma to the model block. Don’t bother storing every value of Sigma[i], just overwrite Sigma at every iteration of the for-loop in the model block.
  5. Optionally, add a penalty term that penalizes large distances between the correlation matrices from adjacent bins. You’d compute a vector of these distances, and then sample that vector from some kind of appropriate regularizing distribution (maybe a half-normal?).
1 Like

@jsocolar Thank you for this! This is all straightforward to me.

Just for clarity/so I follow:

  1. I create a data vector of say length 20.

  2. Declare the array as array[20] cholesky_factor_cor[K] L_Omega

  3. Run the for loop in the model block as for(i in 1:N){Sigma[i] = diag_pre_multiply(s[i],L_Omega[bin[i]])}

  4. Alternatively, add a penalty term.

  1. To declare the array, do cholesky_factor_cor[K] L_Omega[20]
  2. If you are bringing this down into the model block, you can eliminate the indexing on Sigma and just recompute/overwrite Sigma at each iteration of the for loop.
1 Like

Is the same individual measured across multiple ages?

2 Likes

@mike-lawrence No. The data are cross-sectional and not longitudinal. So one data point is an age x and a number of continuous and ordinal responses Y. This can be further subdivided by country of origin, biological sex, life history stage, etc. Eventually, I envision a large hierarchal model where I can compare growth across country, biological sex, etc. The code above is only written to capture 1 country. Once I get it to work, I’ll expand.

1 Like