Estimating correlations between fitted parameters

Hey everyone,

We’re interested in finding the correlations that underly fitted parameters for the same subjects between two experimental domains.

We have a model that gives us a posterior for a parameter of interest for each subject in each experimental domain. We have found that, when you run within-subjects correlations on the point estimates from these posteriors, correlation recovery analysis shows significant systematic underestimation of the true correlations. To solve this, we want to explicitly model the correlations in the model that fits our data.

Let us say that the parameter from both experimental domains is fit in the same model, and we also want to model the underlying correlation between these fitted parameters to get a sense of domain generality. What would be the correct way to write our code to accomplish this? Let us say that this is out starting point, and we’re interested in the underlying correlation of latent variables p1 and p2 which respectively depend on data1 and data2:

parameters{
 vector[ns] p1;
 vector[ns] p2;
}
model {
 data1 ~ distribution(p1);
 data2 ~ distribution(p2);
}

How would we go about writing the program to estimate this underlying correlation (assuming it exists)?

I’m afraid I don’t what “correlation recovery analysis” is or how point estimates could be used to compute correlations of distributions.

If you want to measure posterior correlation, all you need to do is take the draws and you can estimate the full posterior covariance matrix by just computing the sample covariances. You can then divide by the scales and get the correlation matrix.

Alternatively, you can write a model that includes a parameterization of the correlations between two variables:

parameters {
  array[ns] vector[2] p;
  cholesky_factor[2] L_Sigma;
  vector[2] mu;
}
model {
  p ~ multi_normal_cholesky(mu, L_Sigma);
}

If you don’t want the variables to be marginally normal and don’t want to transform them, then what you want to look at is the notion of a copula. I don’t even know if there are tutorials on copulas for Stan, but they’re probably out there.

We suggest an alternative factored LKJ prior where you scale a correlation matrix in the Stan User’s Guide.

1 Like

On the copula piece, there’s a bunch of posts on discourse the main players here are me, @andre.pfeuffer, and @andrjohns. I’ve collected a few different copula examples in Stan code at Helpful Stan Functions: Copula Functions.

2 Likes

In order to estimate the recovery of underlying correlations, we created a generative model that uses a multivariate lognormal distribution to generate two vectors of (0,1) lognormally distributed parameters (two each per simulated subject) with known within-subject correlations. We then find the ground truth correlation between those two vectors. These parameters are then used to generate two data sets per subject, one for each parameter. Each of the two data sets contains all of the subjects and they are fit separately with Stan to get a posterior over each subject’s parameters. The geomean of these lognormally distributed posteriors provides point estimates for the parameters, leaving us with a vector each of the subjects paramters in the correlated data sets. We then calculate the correlation between these point estimates, the “recovered” correlation, and compare it to the ground truth correlation. We do this for ground truth correlations between 0 and 1, make a scatter plot of the ground truth vs recovered correlation, and run a deming regression to get an estimate of the correlation recovery. What we found is that the deming regression had a slope around half of that of unity, suggesting that the rho staistic of the correlations calculated from point estimates were systematically around half that of the underlying ground truth rho statistic. We were aware of this issue before doing this analysis because we were seeing surprisingly weak correlations for things we know must be highly correlated, which prompted us to evaluate this systematic underestimation of correlations generated from correlation analysis of data sets with known underlying correlations.

If the variables are marginally lognormal, and appear in some likelihood function, then would the following work?

parameters {

 array[ns] vector[2] p;
 cholesky_factor[2] L_sigma;
 vector[2] mu;

}

model {

 p ~ multi_normal_cholesky(mu, L_sigma);

 target += likelihood(exp(p[1]) | data1);
 target += likelihood(exp(p[2]) | data2);

}

Also, originally this model was hierarchical:

parameters {

 vector[ns] p;

 real mu;
 real sigma;

}

model {

 mu ~ normal(0,1);
 sigma ~ lognormal(0,1);

 p ~ lognormal(mu,sigma);

 target += likelihood(data | p)

}

This was to allow for partial pooling of subject data within each data set (so data1 and data2 respectively). Is this still possible when modeling correlations like this?

Thank you for all of your help!

I think a Copula model will be what you’re after, if you want to model correlations between non-Gaussian marginals

I’ve got some examples and background here: Copulas

1 Like

So, in the simplest possible case of a hierarchical model with the correlation of only two random variables being of interest, would this work?

functions {

 //bivariate normal copula CDF function
 real bivariate_normal_copula_cdf(real u, real v, real rho) {
  real a = 1 / sqrt(1 - square(rho));
  real avg_uv = mean([u, v]);
  real pu = inv_Phi(u);
  real pv = inv_Phi(v);
  real alpha_u = a * (pv / pu - rho);
  real alpha_v = a * (pu / pv - rho);
  real d = 0;

  if (v == 0.5 || u == 0.5 && v != u) {
    real x = u == 0.5 ? v : u;
    if (rho < 0) 
      return 0.5 * bivariate_normal_copula_cdf(x | x, 1 - 2 * rho ^ 2);
    else 
      return u - 0.5 * bivariate_normal_copula_cdf(x | x, 1 - 2 * rho ^ 2);
 }

}
data {

 //number of participants and trials
 real ns;
 real nt;

 //experimental values
 vector[nt] oris;

 //response matrices
 array[ns] matrix[nt,4] data1;
 array[ns] matrix[nt,4] data2;
parameters {

 //parameters of interest
 vector[ns] p1;
 vector[ns] p2;

 //hyperparameters
 real mu1;
 real sigma1;
 real mu2;
 real sigma2;

 //correlation coefficient
 real rho;

}
model {

 //hyperpriors
 mu1 ~ normal(0,1);
 sigma1 ~ lognormal(0,1);
 mu2 ~ normal(0,1);
 sigma2 ~ lognormal(0,1);

 //correlation prior
 rho ~ uniform(-1,1);

 //priors for parameters of interest
 p1 ~ lognormal(mu1,sigma1);
 p2 ~ lognormal(mu2,sigma2);

 //likelihoods for the two datasets
 target += likelihood(data1,oris,p1);
 target += likelihood(data2,oris,p2);

 //find the CDF value
 pcdf1 = lognormal_cdf(p1,mu1,sigma1);
 pcdf2 = lognormal_cdf(p2,mu2,sigma2);

 //loop through participants
 for (n in 1:ns) {
  //increment by log likelihood of the bivariate normal copula given rho
  target += log(bivariate_normal_copula_cdf(pcdf1[n],pcdf2[n],rho));
 }

}

I don’t think you want the copula cdf. It’s actually written wrong as you have it. Written in Stan it should have an Owen’s T.

The idea is to model all of your marginals just as you were. Then to use the cdf of those marginals and transform to a latent normal space. You need to use the inv_Phi function for this and then plug that into the normal copula. You also don’t need to put a uniform prior on rho as this is implied.

Note that this gives you the linear Pearson correlation. You could also use a student t copula for fatter tails. Or some other copula that uses a different type of dependence.

functions {
real normal_copula_lpdf(real u, real v, real rho) {
  real rho_sq = square(rho);
  
  return (0.5 * rho * (-2. * u * v + square(u) * rho + square(v) * rho)) / (-1. + rho_sq)
         - 0.5 * log1m(rho_sq);
}
...
model {
...
// since p1 and p2 are vectors you need to loop through these
real y1, y2;
 //loop through participants
 for (n in 1:ns) {
  y1 = inv_Phi(lognormal_cdf(p1[n], mu1, sigma1));
  y2 = inv_Phi(lognormal_cdf(p2[n], mu2, sigma2));

  target += normal_copula_lpdf(y1, y2, rho);
 }
}
1 Like

And @andrjohns added the std_normal_log_qf so you can operate with the _lcdf which opens up a larger range of values for finite precision computation:

 for (n in 1:ns) {
  y1 = std_normal_log_qf(lognormal_lcdf(p1[n], mu1, sigma1));
  y2 = std_normal_log_qf(lognormal_lcdf(p2[n], mu2, sigma2));

  target += normal_copula_lpdf(y1, y2, rho);
 }
1 Like

If I wanted the Spearman correlation instead, would I just need to convert the y1 and y2 vectors into their ranks before entering them in the copula?

Careful doing this, because mapping from a vector’s values to its rank order is discontinuous, which can break gradient-based samplers like Stan’s HMC.

1 Like

I see. In that case, would it really be the selection of copula I want to focus on to approximate (or achieve) a Spearman correlation? And, if so, any suggestions?

1 Like

This seems to suggest that the Pearson and Spearman statistics are almost identical, even though that is often not the case outside of Bayesian modeling. Is this just because the correlation model is built into the estimating program through the use of a copula, and therefore less sensitive to things like outliers and nonlinearities in the relationship between the underlying parameters of interest?

It’s because these are the population level correlations in a population assumed to have normal marginals, not sample correlations.

1 Like

@jsocolar in the normal copula the marginals can be any continuous univariate distribution and the correlation follows the relationship outlined in the link above.

@Corey.Plate the relationship is somewhat surprising when you’re first introduced to copulas. It’s one of the ways that the mvn is too restrictive if you think your data have a much different dependency type. I suggest taking a look at section 3 of https://people.math.ethz.ch/~embrecht/ftp/copchapter.pdf.

I want to try out this more stable function, but it looks like our cluster’s version of Stan does not have it (we’re only on cmdstan-2.30.1-mpi). Is there a place where I can find the function written in Stan language to add it as a custom function?

1 Like

Ah, I see. I would have to tinker with Stan under the hood to add this.

I’m not sure if I understand it correctly,but I think the initial question was a bit about testing for correlation. So to say, to compare a model without correlation and a model with some correlation (e.g. imposing some copula)