# Correlation coefficients depend on independent variable

Hello!

Is there a way in Stan to allow the correlation coefficients within a multivariate model to vary as a function of my independent variable x?

That is, I have written a model where both \mu and \sigma are some function of x and also include the cholesky decomposition of the correlation structure that implicitly assumes the correlation structure between each response remains unchanged as x varies. I combine this decomposition with \sigma using diag_pre_multiply. I provide the data and parameter blocks below that describe this. Is there a way to let L_Omega vary as a function of x like I have let the mean and standard variation vary as a function of x? In this case, x is chronological age of a cross-sectional growth sample.

Thanks!

data{
int N; // # of individuals
int K; // # of responses
vector[K] y[N]; // vector of growth responses per individual
real X[N]; //age
}
parameters{
vector<lower=0>[K] a; // multiplicative constant
vector<lower=0>[K] r; // scaling parameter
vector[K] b; // offset
vector<lower=0>[K] s_scale; //constant noise
vector<lower=0>[K] kappa; // slope/gradient of linear noise function
cholesky_factor_corr[K] L_Omega; // cholesky factor the corr matrix for efficiency
}
transformed parameters{
vector[K] mu[N]; //mean array of size N containing vectors of length K
vector[K] s[N]; //sd array of size N containing vectors of length K
matrix[K,K] Sigma[N]; // cov array of size N with K x K cov matrices
for(i in 1:N){
for(k in 1:K){
mu[i,k] = a[k]*X[i]^(r[k])+b[k]; // mean function
s[i,k] = s_scale[k]*(1 + kappa[k]*X[i]); // sd function
}
}
for(i in 1:N){
Sigma[i] = diag_pre_multiply(s[i],L_Omega); // combining the cholesky corr matrix with the noise
}
}


@jsocolar might advise on the original topic of modelling a covariateâ€™s influence on a correlation matrix. Iâ€™ve seen posts here describing hierarchical treatment of correlation matrices, but not influence by a continuous covariate.

@cwolfe can you post your model block too? Also, can you elaborate on the nature of the data? What are the K â€śresponsesâ€ť? If the intended purpose of the multivariate structure is to capture consistent differences between levels of the categorical variable â€śindividualâ€ť and between levels of the categorical variable â€śresponseâ€ť, then Iâ€™d suggest that a item-response-theory framework might be more straightforward.

1 Like

@mike-lawrence Thank you for the response!

I include the full model code below. At the most basic, I am modelling cross-sectional human growth where x is a continuous Real that equates to chronological age (from ~birth to 20 years of age in my data) and Y is a large response vector of length k. In other words, I have measured k continuous growth variables (e.g., size, breadth, etc.) I have parameterized the mean \mu by an offset power law and the noise \sigma that scales linearly demonstrating heteroskedasticity as chronological age increases. In the code below, I assume Y is distributed as a multivariate normal with a mean per response k and decomposed \Sigma to disentangle the noise from the correlation structure. A few notes, I have also fit this model with a Multivariate Studentâ€™s T to compare against the MVN model. Further, I am in the process of trying to fit the model when the data is mixed. That is, the response vector k contains both continuous and ordinal responses variables. I am trying to do this using the GHK algorithm. I have been able to fit continuous and ordinal separately, but have yet to get them to work togetherâ€¦ I have posted other times through the discourse here and here for further information about the model.

In terms of the covariance structure and/or the correlation structure, my ultimate goal here is less concerned about the parameters associated with the mean and noise. Instead, I am most concerned with the nature of the covariance structure between growth responses and how they change with age (x). As it is currently written, I have let the noise vary as a function of x, but the correlations between growth traits are assumed to remain unchanged - there is just a single correlation term for each response variable regardless of age. My domain expertise suggests \rho like \sigma, may also vary as an individual ages.

I hope all of this makes sense!

data{
int N; // # of individuals
int K; // # of responses
vector[K] y[N]; // vector of growth responses per individual
real X[N]; //age
}
parameters{
vector<lower=0>[K] a; // multiplicative constant
vector<lower=0>[K] r; // scaling parameter
vector[K] b; // offset
vector<lower=0>[K] s_scale; //constant noise
vector<lower=0>[K] kappa; // slope/gradient of linear noise function
cholesky_factor_corr[K] L_Omega; // cholesky factor the corr matrix for efficiency
}
transformed parameters{
vector[K] mu[N]; //mean array of size N containing vectors of length K
vector[K] s[N]; //sd array of size N containing vectors of length K
matrix[K,K] Sigma[N]; // cov array of size N with K x K cov matrices
for(i in 1:N){
for(k in 1:K){
mu[i,k] = a[k]*X[i]^(r[k])+b[k]; // mean function
s[i,k] = s_scale[k]*(1 + kappa[k]*X[i]); // sd function
}
}
for(i in 1:N){
Sigma[i] = diag_pre_multiply(s[i],L_Omega); // combining the cholesky corr matrix with the noise
}
}
model{
//priors
a ~ normal(0,10); // half-normal
r ~ normal(0,1); //half-normal
b ~ normal(0,10);
kappa ~ normal(0,1); //half-normal
s_scale ~ cauchy(0,5); //half-cauchy
L_Omega ~ lkj_corr_cholesky(0.5);
for(i in 1:N){
y[i] ~ multi_normal_cholesky(mu[i], Sigma[i]); //likelihood
}
}
generated quantities{
vector[K] Y_mean[N]; //posterior mean
vector[K] Y_pred[N]; //posterior predictive check
corr_matrix[K] Omega; //re-factor corr matrix
cov_matrix[K] Sigma_new[N];//put cov matrix back together
real<lower=0> a_prior = normal_rng(0,10); // a prior
real<lower=0> r_prior = normal_rng(0,1); // r prior
real b_prior = normal_rng(0,10); // b prior
real<lower=0> s_const_prior = cauchy_rng(0,5); // constant s prior
real<lower=0> kappa_prior = normal_rng(0,1); // kappa prior
real s_prior_lin[N] = s_const_prior*(1+kappa_prior*X[N]); //heterskedatsic w/ prior
vector[K] log_lik[N]; // log-likelihood for model comparison
Omega = multiply_lower_tri_self_transpose(L_Omega); //corr matrix
for(i in 1:N){
Sigma_new[i] = quad_form_diag(Omega, s[i]); //posterior cov matrix
}
for(i in 1:N){
for(k in 1:K){
Y_mean[i,k] = a[k]*X[i]^r[k] +b[k]; //posterior mean
}
Y_pred[i] = multi_normal_rng(Y_mean[i],Sigma_new[i]); //posterior predictive
}
real y_sim[N] = normal_rng(a_prior*X^(r_prior) + b_prior,s_prior_lin); // prior pred distribution
for(n in 1:N){
log_lik[n] = multi_normal_cholesky_lpdf(y[n]|mu[n],Sigma[n]); // log lik of each obs
}
}

1 Like

@cwolfe I think the million dollar question here is what functional forms of relationship you want to accommodate between the x and the correlations. Itâ€™s a hard question because the allowable functional forms are heavily constrained by the requirement for positive-semi-definite-ness.

One possibility is to bin the data and estimate independent correlation matrices for each bin. Youâ€™ll face a tradeoff between how finely resolved the bins are and whether each bin contains enough data to robustly estimate the matrix. If you want to get hack-y and non-generative, a possible refinement might be to shrink the bins but then introduce a penalty term that penalizes some metric of the difference/distance between the correlation matrices that are in adjacent bins, using this penalty to pool information across bins. Iâ€™m afraid I donâ€™t have any brilliant ideas about what sort of penalty term you might use though. Perhaps the right choice of penalty yields some nicely interpretable generative model; I donâ€™t know.

1 Like

@jsocolar That is the question! I am not quite sure of the actual functional form, I just know the correlations may change as x increases. I have actually thought about this binning route so Iâ€™m glad to see I wasnâ€™t crazy for thinking that way.

Let me ask this: If I bin the data by age group (e.g., 0-1,1-2, etc.), I could then get a matrix per group, yes? Then, I would take each bin and vary the noise by each bin so I have equal length matrices that I can then combine using diag_pre_multiply. Now, is it then possible to do an if else statement embedded within my likelihood for loop that would make sure I combine the â€ścorrectâ€ť covariance matrix with the â€ścorrectâ€ť mean? So if I have a 1.5 year, I want their mean and the covariance matrix that corresponds with a 1-2 year old. I suppose I could also bin the means as well, but Iâ€™d like to avoid collapsing to much variation if possible.

I didnâ€™t quite follow the description of your proposed code, but hereâ€™s how I would approach the binning solution:

1. Outside of Stan, create a data vector of bins which gives the age group that each observation falls into. Pass this data to Stan as a one-dimensional integer array int bin[N];
2. Declare an array L_Omega_array of correlation matrices, one per bin.
3. When you compute Sigma[i], replace L_Omega with L_Omega_array[bin[i]].
4. (Minor efficiency gain) move the computation of Sigma to the model block. Donâ€™t bother storing every value of Sigma[i], just overwrite Sigma at every iteration of the for-loop in the model block.
5. Optionally, add a penalty term that penalizes large distances between the correlation matrices from adjacent bins. Youâ€™d compute a vector of these distances, and then sample that vector from some kind of appropriate regularizing distribution (maybe a half-normal?).
1 Like

@jsocolar Thank you for this! This is all straightforward to me.

Just for clarity/so I follow:

1. I create a data vector of say length 20.

2. Declare the array as array[20] cholesky_factor_cor[K] L_Omega

3. Run the for loop in the model block as for(i in 1:N){Sigma[i] = diag_pre_multiply(s[i],L_Omega[bin[i]])}

4. Alternatively, add a penalty term.

1. To declare the array, do cholesky_factor_cor[K] L_Omega[20]
2. If you are bringing this down into the model block, you can eliminate the indexing on Sigma and just recompute/overwrite Sigma at each iteration of the for loop.
1 Like

Is the same individual measured across multiple ages?

2 Likes

@mike-lawrence No. The data are cross-sectional and not longitudinal. So one data point is an age x and a number of continuous and ordinal responses Y. This can be further subdivided by country of origin, biological sex, life history stage, etc. Eventually, I envision a large hierarchal model where I can compare growth across country, biological sex, etc. The code above is only written to capture 1 country. Once I get it to work, Iâ€™ll expand.

1 Like