Modeling Multiple Correlations Using Multiple Copulas in the Same Model Code

I have had some success modeling relationships between hierarchically determined parameters using a normal copula. The successful model looks something like this (irrelevant parts removed):

functions {
 //Custom copula function
 real normal_copula(real u, real v, real rho) {

  real rho_sq = square(rho);
  
  return (0.5 * rho * (-2. * u * v + square(u) * rho + square(v) * rho)) / (-1. + rho_sq) - 0.5 * log1m(rho_sq);

 }

}
parameters {

 // Hyperparameters
 real mu_p;
 real<lower=0> sigma_p;
 real mu_v;
 real<lower=0> sigma_v;

 // Parameters
 vector<lower=0>[ns] meta_p;
 vector<lower=0>[ns] meta_v;

 real<lower=-1,upper=1> rho;

}

model {

 // hyperpriors
 mu_p ~ normal(0,1);
 sigma_p ~ lognormal(0,1);
 mu_v ~ normal(0,1);
 sigma_v ~ lognormal(0,1);
 
 // priors
 meta_p ~ lognormal(mu_p,sigma_p);
 meta_v ~ lognormal(mu_v,sigma_v);
 
 // likelihoods
 target += likelihood(data_p,meta_p);
 target += likelihood(data_v,meta_v);
 
 // normal copula
 real y_p;
 real y_v;

 for (n in 1:ns) {

  y_p = inv_phi(lognormal_cdf(meta_p[n],mu_p,sigma_p));
  y_v = inv_phi(lognormal_cdf(meta_v[n],mu_v,sigma_v));

  target += normal_copula(y_p,y_v,rho);

 }

}

This was then extended to a model with conditions in one of the data sets:

parameters {

 // Hyperparameters
 vector[ncond] mu_p;
 vector<lower=0>[ncond] sigma_p;
 vector mu_v;
 vector<lower=0> sigma_v;

 // Parameters
 matrix<lower=0>[ncond,ns] meta_p;
 matrix<lower=0>[ns] meta_v;

 vector<lower=-1,upper=1>[ncond] rho;

}

model {

 // hyperpriors
 mu_p ~ normal(0,1);
 sigma_p ~ lognormal(0,1);
 mu_v ~ normal(0,1);
 sigma_v ~ lognormal(0,1);
 
 // priors
 meta_v ~ lognormal(mu_v,sigma_v);
 for (cond in 1:ncond) {

  meta_p[cond] ~ lognormal(mu_p[cond],sigma_p[cond]);

 }
 
 // likelihoods
 target += likelihood(data_v,meta_v);
 for (cond in 1:ncond) {

  target += likelihood(data_p[cond],meta_p[cond]);

 }
 
 // normal copula
 vector[ncond] y_p;
 real y_v;

 for (n in 1:ns) {

  for (cond in 1:ncond] {

   y_p[cond] = inv_phi(lognormal_cdf(meta_p[cond,n],mu_p[cond],sigma_p[cond]));
   y_v = inv_phi(lognormal_cdf(meta_v[n],mu_v,sigma_v));

   target += normal_copula(y_p[cond],y_v,rho[cond]);

  }

 }

}

And this seemed to work fine (though the correlations were strong enough to possibly be concerning), but we also wanted the correlations between conditions:

parameters {

 // Hyperparameters
 vector[4] mu_p;
 vector<lower=0>[4] sigma_p;

 // Parameters
 matrix<lower=0>[4,ns] meta_p;

 vector<lower=-1,upper=1>[6] rho;

}

model {

 // hyperpriors
 mu_p ~ normal(0,1);
 sigma_p ~ lognormal(0,1);
 
 // priors
 for (cond in 1:4) {

  meta_p[cond] ~ lognormal(mu_p[cond],sigma_p[cond]);

 }
 
 // likelihoods
 for (cond in 1:4) {

  target += likelihood(data_p[cond],meta_p[cond]);

 }
 
 // normal copula
 vector[4] y_p;

 for (n in 1:ns) {

  for (cond in 1:4] {

   y_p[cond] = inv_phi(lognormal_cdf(meta_p[cond,n],mu_p[cond],sigma_p[cond]));

  }

   target += normal_copula(y_p[1],y_p[2],rho[1]);
   target += normal_copula(y_p[1],y_p[3],rho[2]);
   target += normal_copula(y_p[1],y_p[4],rho[3]);
   target += normal_copula(y_p[2],y_p[3],rho[4]);
   target += normal_copula(y_p[2],y_p[4],rho[5]);
   target += normal_copula(y_p[3],y_p[4],rho[6]);

  }

 }

}

And this is where the model totally broke, failing to even remotely converge for any of the model parameters.

I have two questions:

  1. with respect to using the normal copula for multiple correlations in the second model – should these unexpectedly strong correlations give us pause? Or, even more to the point, is having mutliple copulas that depend the same data_v but different data_p[cond] doing something to inflate the correlations?

  2. is this related to why the third model is failing, or is there potentilaly just some more fundamental coding issue that I’m missing in the way the full model is written? That is, is this failure to converge because of all of the correlations we’re trying to fit amongst the four data conditions with six copulas due to it being an improper modeling approach, or is the way we’re modeling our correlations fine but some other coding problem must be causing the convergence issues?

I’m not sure what’s going on because this is a complicated model with a lot going on, but you have a few things going on that are problematic in your model.

For the above, you’re using a non-centered parameterization, which is challenging geometrically unless there’s a lot of data. Instead, you probably want to do something like this:

parameters {
  vector<offset=mu_v, multiplier=sigma_v>[ns] log_meta_v;
...
transformed parameters {
  vector[ns] meta_v = exp(log_meta_v);
...
model {
  log_meta_v ~ noraml(mu_v, sigma_v);

And same for meta_p.

For efficiency, you could try inverse transforming with logistic inverse cdf (i.e., the logit function), which is more arithmetically stable and efficient than inv_phi (though a slightly different model).

For style, declare variables locally, as here:

for (n in 1:ns) {
  real y_p = inv_phi(...);
  real y_v = inv_phi(...);
  target == ...
}

It’s easier to read and just as efficient. I can be useful to define vectors on the outside because those have allocation costs.

Another angle might be to try to do some algebraic reductions in the sequence

normal_copula(inv_phi(lognormal_cdf(...), ...)

but none pop out at me.

I see, but there’s nothing un-kosher about using six normal copula to model all of the dependencies between four marginal parameter distributions?

That is, even though this is not modeled as a normal multivariate with a correlation matrix (or a cholesky matrix), modeling things in this way shouldn’t result in anything untoward happening with the estimation of the rho values?

(As an aside, I can try to improve the posterior geometry if it turns out that there is nothing else obviously wrong with the code, but this is actually just a copula extension of a BHM that we know performs well. Even that second model in the first post, where there are marginal distributions of parameters fit to data from four conditions in one experiment, sharing a copula each with the marginal distribution of the parameters fit to data from another experiment, the model converges. But if the third model is just the tipping point in geometric complexity, I can try to address that with the reparameterization you suggest.)

The main thing I’m concerned about, however, is if it’s okay or not to model multiple copulas in this way in general.

This is our model that does seem to converge:

functions {

 //Custom quantile function for lognormal distribution
 matrix log_normal_qf(vector x, vector m, vector s) {

  //takes a vector sampling the x-axis in terms of cumulative density, plus vectors of mu and sigma parameters
  return exp(rep_matrix(s' * sqrt(2), size(x)) .* rep_matrix(inv_erfc(-2 * x + 2), size(m)) + rep_matrix(m', size(x)));

 }

 //Custom copula function
 real normal_copula(real u, real v, real rho) {

  real rho_sq = square(rho);
  
  return (0.5 * rho * (-2. * u * v + square(u) * rho + square(v) * rho)) / (-1. + rho_sq) - 0.5 * log1m(rho_sq);

 }

 //Likelihood model for CASANDRE
 real casandre_log(array[] matrix resps, vector guess, matrix sds, matrix sm, vector nconf, vector pconf) {

  //declare and initialize log likelihood variable
  real llhC;
  llhC = 0;

  //loop over participants to calculate log likelihood
  for (n in 1:size(resps)) {

   //declare the index-sizing variable
   int m[rows(resps[n])];

   //build the logical vector of non-zero trials
   for (tt in 1:size(m)) {
      m[tt] = sum(resps[n][tt]) == 1;
   }

   //declare the likelihood variable
   matrix[rows(sm),4] lhC;

   //check if subject has any non-zero data before running
   if (sum(resps[n]) != 0) {

    //declare incrementing and index variables
    int t;
    int ind[sum(m)];

    //initialize incrementing variable
    t = 1;

    //declare and calculate mu parameters for normal cdfs
    matrix[rows(sm),rows(sds)] avgs;
    avgs = rep_matrix(sm[,n],rows(sds)).*rep_matrix(sds[,n]',rows(sm));
  
    //loop over trials
    for (tr in 1:rows(sm)){

     //check if trial has any non-zero data before running
     if (sum(resps[n][tr]) != 0) {

      //declare sample vector
      matrix[3,rows(sds)] raws;

      //sample the cumulative density of normals along the transformed x-axis for each confidence bin
      for (rws in 1:rows(sds)) {
       raws[1,rws] = normal_cdf(-nconf[n],avgs[tr,rws],sds[rws,n]);
      }
      for (rws in 1:rows(sds)) {
       raws[2,rws] = normal_cdf(0,avgs[tr,rws],sds[rws,n]);
      }
      for (rws in 1:rows(sds)) {
       raws[3,rws] = normal_cdf(pconf[n],avgs[tr,rws],sds[rws,n]);
      }

      //declare the cumulative likelihood variable
      vector[3] ratiodist;

      //take the mean of the sampled densities
      ratiodist[1] = mean(raws[1]);
      ratiodist[2] = mean(raws[2]);
      ratiodist[3] = mean(raws[3]);

      //implement a lapse rate parameter to absorb unmodeled noise, and calculate likelihoods of each choice possibility in the trial
      lhC[tr,1] = ((guess[n]/4) + ((1-guess[n])*ratiodist[1]));
      lhC[tr,2] = ((guess[n]/4) + ((1-guess[n])*(ratiodist[2]-ratiodist[1])));
      lhC[tr,3] = ((guess[n]/4) + ((1-guess[n])*(ratiodist[3]-ratiodist[2])));
      lhC[tr,4] = ((guess[n]/4) + ((1-guess[n])*(1-ratiodist[3])));

      //add this trial to the index
      ind[t] = tr;

      //increment the index
      t = t + 1;

     }

    }

    //multiply the choice matrix by the log of the likelihood matrix
    llhC += sum(columns_dot_product(resps[n][ind], log(lhC[ind])));

   }
  
  }

  //return the log likelihood for all data given the sampled parameters
  return llhC;
  
 }

 //Partial sum function
 real partial_sum_casandre_log(array[] matrix slice_n_resps, int start, int end, vector guess, matrix sds, matrix sm, vector nconf, vector pconf) {

 //dynamically slice data into the log likelihood function
 return casandre_log(slice_n_resps, guess[start:end], sds[:,start:end], sm[:,start:end], nconf[start:end], pconf[start:end]);

 }
  
}
data {

 //declare the number of subjects
 int ns;

 //declare the number of trials in the largest unpadded data set for each experiment
 int nt_p;
 int nt_v;

 //declare the number of conditions and number of contrast levels in the perceptual domain
 int ncond;
 int ncont;

 //declare the response matrix (one subject per array) for perceptual domain
 array[ncond,ncont,ns] matrix[nt_p,4] resps_p;

 //declare the response matrix (one per subject) for the value-based domain
 array[ns] matrix[nt_v,4] resps_v;

 //declare the experimental values matrices (one subject per matrix in each array) for perceptual domain
 array[ncond,ncont] matrix[nt_p,ns] vals_p;

 //declare the experimental values matrices (one subject per matrix) for value-based domain
 matrix[nt_v,ns] vals_v;

 //declare the number of x-axis samples and the vector sampling the x-axis in terms of cumulative density
 int sampn;
 vector[sampn] sampx;

}
parameters {

 //Perceptual parameters
 vector<lower=0,upper=1>[ns] guess_p; //lapse rate
 matrix<lower=0>[ncont,ns] sens_p; //stimulus sensitivities
 matrix[ncont,ns] crit_p; //decision criteria
 matrix<lower=0>[ncond,ns] meta_p; //meta-uncertainties
 matrix<lower=0>[ncond,ns] nconf_p; //negative confidence criteria
 matrix<lower=0>[ncond,ns] pconf_p; //positive confidence criteria

 //Value-based parameters
 vector<lower=0,upper=1>[ns] guess_v; //lapse rate
 vector<lower=0>[ns] sens_v; //stimulus sensitivities
 vector<lower=0>[ns] meta_v; //meta-uncertainties
 vector<lower=0>[ns] nconf_v; //negative confidence criteria
 vector<lower=0>[ns] pconf_v; //positive confidence criteria

 //Perceptual hyperparameters
 vector[ncont] ssm_p; //mu hyperparameters of each contrast level's stimulus sensitivity lognormal prior
 vector<lower=0>[ncont] sss_p; //sigma hyperparameters of each contrast level's stimulus sensitivity lognormal prior
 vector[ncont] scm_p; //mu hyperparameters of each contrast level's decision criterion normal prior
 vector<lower=0>[ncont] scs_p; //sigma hyperparameters of each contrast level's s decision criterion normal prior
 vector[ncond] mum_p; //mu hyperparameters of each condition's meta-uncertainty lognormal prior
 vector<lower=0>[ncond] mus_p; //sigma hyperparameters of each condition's meta-uncertainty lognormal prior
 vector[ncond] nccm_p; //mu hyperparameters of each condition's negative confidence criterion lognormal prior
 vector<lower=0>[ncond] nccs_p; //sigma hyperparameters of each condition's negative confidence criterion lognormal prior
 vector[ncond] pccm_p; //mu hyperparameters of each condition's positive confidence criterion lognormal prior
 vector<lower=0>[ncond] pccs_p; //sigma hyperparameters of each condition's positive confidence criterion lognormal prior

 //Value-based hyperparameters
 real ssm_v;
 real<lower=1>sss_v;
 real mum_v;
 real<lower=1>mus_v;
 real nccm_v;
 real<lower=1> nccs_v;
 real pccm_v;
 real<lower=1> pccs_v;

 //Correlation parameter
 vector<lower=-1,upper=1>[ncond] rho;

}
model {

 //Perceptual hyperpriors
 ssm_p ~ normal(0,1);
 sss_p ~ lognormal(0,1);
 scm_p ~ normal(0,1);
 scs_p ~ lognormal(0,1);
 mum_p ~ normal(0,1);
 mus_p ~ lognormal(0,1);
 nccm_p ~ normal(0,1);
 nccs_p ~ lognormal(0,1);
 pccm_p ~ normal(0,1);
 pccs_p ~ lognormal(0,1);

 //Value-based hyperpriors
 ssm_v ~ normal(0,1);
 sss_v ~ lognormal(0,1);
 mum_v ~ normal(0,1);
 mus_v ~ lognormal(0,1);
 nccm_v ~ normal(0,1);
 nccs_v ~ lognormal(0,1);
 pccm_v ~ normal(0,1);
 pccs_v ~ lognormal(0,1);

 //Perceptual priors
 guess_p  ~ beta(1,197.0/3.0);

 for (cont in 1:ncont) {
  sens_p[cont] ~ lognormal(ssm_p[cont],sss_p[cont]);
  crit_p[cont] ~ normal(scm_p[cont],scs_p[cont]);
 }

 for (cond in 1:ncond) {
  meta_p[cond] ~ lognormal(mum_p[cond],mus_p[cond]);
  nconf_p[cond] ~ lognormal(nccm_p[cond],nccs_p[cond]);
  pconf_p[cond] ~ lognormal(pccm_p[cond],pccs_p[cond]);
 }

 //Value-based priors
 guess_v ~ beta(1,197.0/3.0);
 sens_v ~ lognormal(ssm_v,sss_v);
 meta_v ~ lognormal(mum_v,mus_v);
 nconf_v ~ lognormal(nccm_v,nccs_v);
 pconf_v ~ lognormal(pccm_v,pccs_v);

 //declare the transformed x-axis matrices
 array[ncond] matrix[sampn,ns] xtrans_p;
 array[ncond] matrix[sampn,ns] sds_p;

 //Perceptual likelihood model (with local variable calculations)
 for (cond in 1:ncond) {

  //calculate the transformed x-axis matrices
  xtrans_p[cond] = log_normal_qf(sampx,-0.5*log1p(meta_p[cond]'.^2),sqrt(log1p(meta_p[cond]'.^2)));
  sds_p[cond] = 1./xtrans_p[cond];

  for (cont in 1:ncont) {

   //check if condition and contrast level combination has non-zero data
   if (sum(abs(vals_p[cond][cont])) != 0) {

    //declare matrices for calculating each contrast level's transformed data
    array[ncont] matrix[nt_p,ns] om_p;
    array[ncont] row_vector[ns] cm_p;
    array[ncont] matrix[nt_p,ns] sm_p;

    //orientations and decision criteria in terms of standard deviations
    om_p[cont] = vals_p[cond][cont].*rep_matrix(sens_p[cont],nt_p);
    cm_p[cont] = crit_p[cont].*sens_p[cont];
    
    //transform the data
    sm_p[cont] = om_p[cont]-rep_matrix(cm_p[cont],nt_p);

    //hand the data and parameters to the reduce sum wrap function for dynamic within-chain parallelization
    target += reduce_sum(partial_sum_casandre_log, resps_p[cond][cont], 1, guess_p, sds_p[cond], sm_p[cont], nconf_p[cond]', pconf_p[cond]');

   }

  }

 }

 //Value-based likelihood model (with local variable calculations)

 //declare the transformed x-axis matrices
 matrix[sampn,ns] xtrans_v;
 matrix[sampn,ns] sds_v;

 //calculate the transformed x-axis matrices
 xtrans_v = log_normal_qf(sampx,-0.5*log1p(meta_v.^2),sqrt(log1p(meta_v.^2)));
 sds_v = 1./xtrans_v;

 //declare matrices for calculating transformed data
 matrix[nt_v,ns] om_v;
 matrix[nt_v,ns] sm_v;

 //orientations in terms of standard deviations
 om_v = vals_v.*rep_matrix(sens_v',nt_v);

 //transform the data
 sm_v = om_v;
 
 //hand the data and parameter to the reduce sum wrap function for dynamic within-chain parallelization
 target += reduce_sum(partial_sum_casandre_log, resps_v, 1, guess_v, sds_v, sm_v, nconf_v, pconf_v);

 //Correlation copula
 vector[ncond] y_p;
 real y_v;

 for (n in 1:ns) {

  for (cond in 1:ncond) {

   y_p[cond] = inv_Phi(lognormal_cdf(meta_p[cond,n],mum_p[cond],mus_p[cond]));
   y_v = inv_Phi(lognormal_cdf(meta_v[n],mum_v,mus_v));

   target += normal_copula(y_p[cond],y_v,rho[cond]);

  }

 }

}

That I don’t know—I’ve never tried to fit copula models. But I would test by seeing if I can simulate data and recover the right posterior interval coverage.

I would strongly suggest a non-centered parameterization even if you can get the centered one to fit with enough draws. It may turn out you have enough data and/or a strong enough prior not to need it, but it’s easy to test. We’re trying to figure out how to make them easier for lognormals, but for now, it has to be done manually.

P.S. I wish someone would write a User’s Guide chapter on them so I can learn them easily, but I’ll probably just try to do it myself (contributing to the User’s Guide is great for learning stats, in my experience).

I’m attempting this both with a Cholesky matrix to model the dependency, and with a Cholesky matrix to model the dependency with the transformed variable like you suggested. Unfortunately, because of the nature of our data (too little within-subject variability) we can only get stable fits when we model hierarchically (taking advantage of the between-subject variability). The copula is an additional layer on top of that, to model a dependency between data collection regimes (the thing we actually care about).

Take a look at this copula prime rthat @andrjohns put together using Stan Copulas

3 Likes

@Corey.Plate
Take also a look at Vine copula examples in Stan - #3 by jmh530
The model simple_cvine_copula.stan works. There is also a linked a paper in more detail about the conditional copula.
What I see as a difference between the model above is that there is no difference calculation in the model above. See

// fit second level
// The general appraoch is to get each conditional CDF. This requires
// taking the derivative of the copula CDF when holding one variable
// constant. Things are simpler here because I’m using normal marginals
// and copulae, but this doesn’t hold generally. What I’m doing here
// is essentially regressing variables 2 and 3 on variable 1 and
// and dividing by the residual standard deviation. Since we have
// standardized the data above, it simplifies the calculations here
// and I don’t need to do the regression because the correlations are
// re-used from above. Works for normals, but not more generally.
Uinv_2nd_level[, 1] = (Uinv[, 2] - p12 * Uinv[, 1]) / sqrt(1 - square(p12));
Uinv_2nd_level[, 2] = (Uinv[, 3] - p13 * Uinv[, 1]) / sqrt(1 - square(p13));
Uinv_2nd_level[, 2] ~ normal_copula_vector(Uinv_2nd_level[, 1], p23_given_1);

2 Likes

Works perfectly!