Calculating Bayes Factor for Hypothesis Testing of Copula-Based Correlation from MCMC Samples

Hey everyone,

Our lab has run a model that uses a copula with a uniform prior to calculate the correlation between groups of parameters in a hierarchical model with a custom likelihood function. I will display the model below this text.

My question is, given our MCMC sample distribution over rho, our correlation parameter, how can we go about calculating Bayes Factor for hypothesis testing where the null hypothesis is that there is no correlation between the groups of parameters, and the alternative hypothesis that there is a true correlaton between the groups of parameters. And then how do we interpret that Bayes Factor once it is calculated?

And are we thinking about this in the right way (i.e. is this the question we should be asking of our posteriors if we want to evaluate the correlation that underpins them)?

As always, I appreciate the help!

functions {

 matrix log_normal_qf(vector x, vector m, vector s) {

  return exp(rep_matrix(s' * sqrt(2), size(x)) .* rep_matrix(inv_erfc(-2 * x + 2), size(m)) + rep_matrix(m', size(x)));

 }

 real normal_copula(real u, real v, real rho) {

  real rho_sq = square(rho);
  
  return (0.5 * rho * (-2. * u * v + square(u) * rho + square(v) * rho)) / (-1. + rho_sq) - 0.5 * log1m(rho_sq);

 }

 real casandre_log(array[] matrix resps, vector guess, matrix sds, matrix sm, vector nconf, vector pconf) {

  real llhC;
  llhC = 0;

  for (n in 1:size(resps)) {

   int m[rows(resps[n])];

   for (tt in 1:size(m)) {
      m[tt] = sum(resps[n][tt]) == 1;
   }

   matrix[rows(sm),4] lhC;

   if (sum(resps[n]) != 0) {

    int t;
    int ind[sum(m)];

    t = 1;

    matrix[rows(sm),rows(sds)] avgs;
    avgs = rep_matrix(sm[,n],rows(sds)).*rep_matrix(sds[,n]',rows(sm));
  
    for (tr in 1:rows(sm)){

     if (sum(resps[n][tr]) != 0) {

      matrix[3,rows(sds)] raws;

      for (rws in 1:rows(sds)) {
       raws[1,rws] = normal_cdf(-nconf[n],avgs[tr,rws],sds[rws,n]);
      }
      for (rws in 1:rows(sds)) {
       raws[2,rws] = normal_cdf(0,avgs[tr,rws],sds[rws,n]);
      }
      for (rws in 1:rows(sds)) {
       raws[3,rws] = normal_cdf(pconf[n],avgs[tr,rws],sds[rws,n]);
      }

      vector[3] ratiodist;

      ratiodist[1] = mean(raws[1]);
      ratiodist[2] = mean(raws[2]);
      ratiodist[3] = mean(raws[3]);

      lhC[tr,1] = ((guess[n]/4) + ((1-guess[n])*ratiodist[1]));
      lhC[tr,2] = ((guess[n]/4) + ((1-guess[n])*(ratiodist[2]-ratiodist[1])));
      lhC[tr,3] = ((guess[n]/4) + ((1-guess[n])*(ratiodist[3]-ratiodist[2])));
      lhC[tr,4] = ((guess[n]/4) + ((1-guess[n])*(1-ratiodist[3])));

      ind[t] = tr;

      t = t + 1;

     }

    }

    llhC += sum(columns_dot_product(resps[n][ind], log(lhC[ind])));

   }
  
  }

  return llhC;
  
 }

 real partial_sum_casandre_log(array[] matrix slice_n_resps, int start, int end, vector guess, matrix sds, matrix sm, vector nconf, vector pconf) {

 return casandre_log(slice_n_resps, guess[start:end], sds[:,start:end], sm[:,start:end], nconf[start:end], pconf[start:end]);

 }
  
}
data {

 int ns;

 int nt_p;
 int nt_v;

 int ncond;
 int ncont;

 array[ncond,ncont,ns] matrix[nt_p,4] resps_p;

 array[ns] matrix[nt_v,4] resps_v;

 array[ncond,ncont] matrix[nt_p,ns] vals_p;

 matrix[nt_v,ns] vals_v;

 int sampn;
 vector[sampn] sampx;

}
parameters {

 vector<lower=0,upper=1>[ns] guess_p;
 matrix<lower=0>[ncont,ns] sens_p;
 matrix[ncont,ns] crit_p;
 vector<lower=0>[ns] meta_p;
 matrix<lower=0>[ncond,ns] nconf_p;
 matrix<lower=0>[ncond,ns] pconf_p;

 vector<lower=0,upper=1>[ns] guess_v;
 vector<lower=0>[ns] sens_v;
 vector<lower=0>[ns] meta_v;
 vector<lower=0>[ns] nconf_v;
 vector<lower=0>[ns] pconf_v;

 vector[ncont] ssm_p;
 vector<lower=0>[ncont] sss_p;
 vector[ncont] scm_p;
 vector<lower=0>[ncont] scs_p;
 real mum_p;
 real<lower=0> mus_p;
 vector[ncond] nccm_p;
 vector<lower=0>[ncond] nccs_p;
 vector[ncond] pccm_p;
 vector<lower=0>[ncond] pccs_p;

 real ssm_v;
 real<lower=1> sss_v;
 real mum_v;
 real<lower=1>mus_v;
 real nccm_v;
 real<lower=1> nccs_v;
 real pccm_v;
 real<lower=1> pccs_v;

 real<lower=-1,upper=1> rho;

}
model {

 ssm_p ~ normal(0,1);
 sss_p ~ lognormal(0,1);
 scm_p ~ normal(0,1);
 scs_p ~ lognormal(0,1);
 mum_p ~ normal(0,1);
 mus_p ~ lognormal(0,1);
 nccm_p ~ normal(0,1);
 nccs_p ~ lognormal(0,1);
 pccm_p ~ normal(0,1);
 pccs_p ~ lognormal(0,1);

 ssm_v ~ normal(0,1);
 sss_v ~ lognormal(0,1);
 mum_v ~ normal(0,1);
 mus_v ~ lognormal(0,1);
 nccm_v ~ normal(0,1);
 nccs_v ~ lognormal(0,1);
 pccm_v ~ normal(0,1);
 pccs_v ~ lognormal(0,1);

 guess_p  ~ beta(1,197.0/3.0);
 meta_p ~ lognormal(mum_p,mus_p);

 for (cont in 1:ncont) {
  sens_p[cont] ~ lognormal(ssm_p[cont],sss_p[cont]);
  crit_p[cont] ~ normal(scm_p[cont],scs_p[cont]);
 }

 for (cond in 1:ncond) {
  nconf_p[cond] ~ lognormal(nccm_p[cond],nccs_p[cond]);
  pconf_p[cond] ~ lognormal(pccm_p[cond],pccs_p[cond]);
 }

 guess_v ~ beta(1,197.0/3.0);
 sens_v ~ lognormal(ssm_v,sss_v);
 meta_v ~ lognormal(mum_v,mus_v);
 nconf_v ~ lognormal(nccm_v,nccs_v);
 pconf_v ~ lognormal(pccm_v,pccs_v);

 for (cond in 1:ncond) {

  matrix[sampn,ns] xtrans_p;
  matrix[sampn,ns] sds_p;

  xtrans_p = log_normal_qf(sampx,-0.5*log1p(meta_p.^2),sqrt(log1p(meta_p.^2)));
  sds_p = 1./xtrans_p;

  for (cont in 1:ncont) {

   if (sum(abs(vals_p[cond][cont])) != 0) {

    array[ncont] matrix[nt_p,ns] om_p;
    array[ncont] row_vector[ns] cm_p;
    array[ncont] matrix[nt_p,ns] sm_p;

    om_p[cont] = vals_p[cond][cont].*rep_matrix(sens_p[cont],nt_p);
    cm_p[cont] = crit_p[cont].*sens_p[cont];
    
    sm_p[cont] = om_p[cont]-rep_matrix(cm_p[cont],nt_p);

    target += reduce_sum(partial_sum_casandre_log, resps_p[cond][cont], 1, guess_p, sds_p, sm_p[cont], nconf_p[cond]', pconf_p[cond]');

   }

  }

 }

 matrix[sampn,ns] xtrans_v;
 matrix[sampn,ns] sds_v;

 xtrans_v = log_normal_qf(sampx,-0.5*log1p(meta_v.^2),sqrt(log1p(meta_v.^2)));
 sds_v = 1./xtrans_v;

 matrix[nt_v,ns] om_v;
 matrix[nt_v,ns] sm_v;

 om_v = vals_v.*rep_matrix(sens_v',nt_v);

 sm_v = om_v;
 
 target += reduce_sum(partial_sum_casandre_log, resps_v, 1, guess_v, sds_v, sm_v, nconf_v, pconf_v);

 real y_p;
 real y_v;

 for (n in 1:ns) {

  y_p = inv_Phi(lognormal_cdf(meta_p[n],mum_p,mus_p));
  y_v = inv_Phi(lognormal_cdf(meta_v[n],mum_v,mus_v));

  target += normal_copula(y_p,y_v,rho);

 }

}

Bayes factors don’t involve a null hypothesis and alternative hypothesis. They’re just for comparing the prior predictive distributions of pairs of models.

All you need to be able to do for Bayes factors is evaluate the data marginal p(y), where y is your data. With a Bayesian model, that’s just p(y) = \int_\Theta p(y \mid \theta) \cdot p(\theta) \, \textrm{d}\theta. For that, it doesn’t matter what the parameters \theta are in the two models being compared.

It turns out that estimating p(y) is challenging! As far as I know, we don’t have any tooling in Stan for computing Bayes factors.

Maybe not, if you’re trying to compute Bayes factors! I’d very strongly recommend reading the section about Bayes factors in Gelman et al.'s Bayesian Data Analysis (available free online in pdf form) and in the linked papers by Aki Vehtari, Yuling Yao, and Jonah Gabry around LOO and WAIC.

Constructively, I’d recommend either posterior predictive checks or cross-validation. I think PPCs are helpful in figuring out where your model is failing, and I prefer cross-validation to Bayes factors because it evaluates posterior predictive inference rather than prior predictive inference. The discussion in BDA and Aki et al.'s papers is much more nuanced.

Leave-one out (LOO) cross-validation evaluates

p(y_n \mid y_{-n}) = \int p(y_n \mid \theta) \cdot p(\theta \mid y_{-n}) \, \textrm{d}\theta

for every n, where y_{-n} = y_1, \ldots, y_{n-1}, y_{n + 1}, \ldots, y_N, then averages the results over all n to get an estimate of of the expected log posterior density (ELPD). You can see that cross-validation depends on the posterior p(\theta \mid y_{-n}) where Bayes factors depend on the prior p(\theta).