Copula regression example (gaussian & poisson)

Hi

This is a question about copulas in Stan. I am trying to estimate a copula model which introduces some correlation between a Gaussian and a Poisson regression. This is my data generating process:

library(MASS)
s_size <- 600
x1 <- rnorm(s_size, mean = 1, sd = 3)
x2 <- sample(c(0, 1), s_size, replace = TRUE)

# correlation parameter
rho_pop <- .4
cov_matrix <- matrix(c(1, rho_pop, rho_pop, 1), ncol = 2)
cr_error <- mvrnorm(s_size, mu = c(0, 0), Sigma = cov_matrix)
# transform gaussian error to uniform
x_transf <- pnorm(cr_error)
# transform outcomes back to gaussian/Poisson
y_gauss <- qnorm(x_transf[,2], mean = (1 + .5 * x1 +.05 * x2),  sd = .8)
y_pois <- qpois(x_transf[,1], lambda = exp(-1 + 0.3 * x1 - 0.5 * x2))

The Stan code I use is provided below. The copula code works fine when using simple outcomes (without a regression model). However, if I try to use it for the regression setup, I consistently estimate the wrong copula correlation parameter. The issue seems to arise when I introduce the transformed parameter step to transform the Gaussian and Poisson outcomes to uniform.

functions {
real normal_copula_llpdf(real u, real v, real rho) {
  real rho_sq = square(rho);
  return (0.5 * rho * (-2. * u * v + square(u) * rho + square(v) * rho)) / (-1. + rho_sq)
         - 0.5 * log1m(rho_sq);
}
}
data {
	int<lower=1> s_size; // number of responses
	int<lower=0> K1; //        Number of independent variables
	int<lower=0> K2; //        Number of independent variables
  matrix[s_size, K1] mX1;     // independent variables
  matrix[s_size, K2] mX2;     // independent variables
	vector[s_size] y_gauss; 
	int y_pois[s_size]; 
}
parameters {
  real mu_gauss;
  vector[K1] betas_gauss;
  real<lower=0> sigma_gauss; 
  real mu_pois;
  vector[K2] betas_pois;
  real<lower=-1,upper= 1> rho;
}
transformed parameters {
// transform gaussian outcomes to uniform by taking normal cdf
  real<lower=0,upper=1> y_gauss_cop[s_size];
  real<lower=0,upper=1> y_pois_cop[s_size];
  for (i in 1:s_size){
      y_gauss_cop[i] = normal_cdf(y_gauss[i], mu_gauss+mX1[i,]*betas_gauss,sigma_gauss);
      y_pois_cop[i] = poisson_cdf(y_pois[i], exp(mu_pois + mX2[i,]*betas_pois));
}}
model {
  rho ~ normal(0,1);
  y_gauss ~ normal(mu_gauss + mX1*betas_gauss, sigma_gauss);
  y_pois ~ poisson(exp(mu_pois + mX2*betas_pois));
  // copula pdf
  for (i in 1:s_size){
  target +=  normal_copula_llpdf(y_gauss_cop[i], y_pois_cop[i], rho);
  }
}

I should also mention I’m not 100 % sure if the mistake concerns the stan code or if it is a conceptual mistake (so perhaps I’m just formulating the wrong model). Anybody would have any idea about how to tackle this?

You can’t directly use discrete outcomes in copulas since there isn’t a unique mapping from the uniform → discrete space, you need to use a data-augmentation approach.

I’ve got an example here: Copulas

3 Likes

That’s a great post over copulas. I have a few questions about the model in the link above section Mixed Continuous-Discrete Marginals which I don’t understand.

model {
  matrix[N, 4] u_mix;
  for (n in 1:N) {
    u_mix[n, 1] = exponential_cdf(Y[n,1] | lambda_exp);
    u_mix[n, 2] = chi_square_cdf(Y[n,2] | nu);
    u_mix[n, 3] = u[n, 1];
    u_mix[n, 4] = u[n, 2];
  }

  Y[, 1] ~ exponential(lambda_exp);
  Y[, 2] ~ chi_square(nu);
  u_mix ~ gauss_copula_cholesky(rho_chol);
}

We have the fits Y[,1], Y[,2] but there are no fitting to the poissons and binomials?
Shouldn’t there be also Stan

pois_y ~ poisson_log(...)
binom_y ~ binomial(...)

statements?

The expressions

u_1 \sim U\left(F_1(x_1^-), F_1(x_1)\right) \\ u_2 \sim U\left(F_2(x_2^-), F_2(x_2)\right)

have their Stan expression in function gauss_copula_cholesky_lpdf

q[n] = inv_Phi(u[n]);

But this assumes a uniform(0, 1) distribution and not a uniform(a, b) distribution. What I am missing here?

Take a look at:
https://github.com/spinkney/helpful_stan_functions/blob/main/functions/copula/centered_gaussian_copula.stanfunctions

/**
 * Poisson marginal
 *
 * Poisson marginal for mixed continuous-discrete Gaussian
 * copula.
 *
 * The lower-bound and upper-bound: \n
 *      The upper-bound is always at \code{.cpp} inv_Phi( y, mu ) \endcode \n
 *      If \f$y \ne 0\f$, lower-bound at \code{.cpp} inv_Phi(y-1, mu) \endcode. \n
 *      At \f$y = 0\f$ the lower-bound is \f$ 0 \f$.
 * @copyright Ethan Alt, Sean Pinkney 2021 \n
 *
 * @param y int[,] 2D array of integer counts
 * @param mu_glm matrix of regression means
 * @param u_raw matrix of nuisance latent variables
 * @return 2D array of matrices containing the random variables
 *         and jacobian adjustments
 */
array[] matrix poisson_marginal(array[,] int y, matrix mu_glm, matrix u_raw) {
  int N = rows(mu_glm);
  int J = cols(mu_glm);
  matrix[N, J] mu_glm_exp = exp(mu_glm);
  array[2] matrix[N, J] rtn;
  
  for (j in 1 : J) {
    for (n in 1 : N) {
      real Ubound = poisson_cdf(y[n, j] | mu_glm_exp[n, j]);
      real Lbound = 0;
      if (y[n, j] > 0) {
        Lbound = poisson_cdf(y[n, j] - 1 | mu_glm_exp[n, j]);
      }
      real UmL = Ubound - Lbound;
      rtn[1][n, j] = inv_Phi(Lbound + UmL * u_raw[n, j]);
      rtn[2][n, j] = log(UmL);
    }
  }
  
  return rtn;
}

This code line:

  rtn[1][n, j] = inv_Phi(Lbound + UmL * u_raw[n, j]);

is just the mapping from a discrete cdf to a continuous one.
And the following

  rtn[2][n, j] = log(UmL);

is the corresponding pdf distribution.

We have the fits Y[,1], Y[,2] but there are no fitting to the poissons and binomials?

The discrete marginals contribute to the likelihood through the latent-variables u. You can see sections 2.2 and 2.3 from Smith & Khaled 2012 for the technical details.

This code line:

  rtn[1][n, j] = inv_Phi(Lbound + UmL * u_raw[n, j]);

is just the mapping from a discrete cdf to a continuous one.

That’s not quite correct. That line is transforming a U(0,1) to a U(F_1(x_1^-),F_1(x_1)) variable before applying the standard-normal quantile.

And the following

  rtn[2][n, j] = log(UmL);

is the corresponding pdf distribution.

That is the jacobian adjustment required for the transformation above.

In the examples I posted, the u parameters are declared with the corresponding F_1(x_1^-) and F_1(x_1) bounds directly, so Stan handles the transformations and jacobian adjustment internally

1 Like

@andrjohns I’m also enjoying your post. The section “Bayesian Estimation: Data Augmentation” lost me at first with the finding upper and lower bounds, but after rereading it a few times and looking at Smith and Khaled, I eventually got it (I think).

When I run the mixed continuous-discrete marginals model I am getitng alot of warnings like this at the start of the run:

Is this inevitable, or can it be avoided?

That’s fine and to be expected, unfortunately. The _cdf functions used to define the bounds can easily over-/under-flow to 0/1, triggering the rejection.

A more robust implementation would use log-CDFs, but we don’t have very good implementations of those in Stan (majority simply just evaluate the CDF and then return the log)

2 Likes

ok yes makes sense thank you. One other thing I was wondering is if you have a continuous-bernoulli worked example anywhere?

I have found the easiest approach for copulas (by far) is to implement the variable augmented approach of Smith and Khaled (2012).

Specifically, you can write the complete data density as

f(y_c, y_d, u) = c(u | \Gamma) \left[ \prod_{j \in \mathcal{C}} f_j(y_j) \right] \left[ \prod_{j \in \mathcal{D}} I( F_j(y_j-1) \le u_{j} \le F_j(y_j) ) \right],

where c(\cdot) is the copula density function, \mathcal{C} contains the indices of the continuous margins, \mathcal{D} contains the indices of the discrete margins, and

u_{j} = F_j(y_j) \text{ if } j \in \mathcal{C}.

Note that the above density implies

U_j \sim U( F_j(y_j - 1), F_j(y_j) ), \ \ j \in \mathcal{D}

so your latent U_j's will have varying bounds.

For the Gaussian copula, we have

c(u | \Gamma) = |\Gamma|^{-1/2} \exp\left\{ q(u)'[ \Gamma^{-1} - I ] q(u) \right\}

where q(u) = (q_1, \ldots, q_J)' = ( \Phi^{-1}(u_1), \ldots, \Phi^{-1}(u_J ) )'.

If the j^{th} margin is Gaussian, then u_j = \Phi((y_j - \mu_j) / \sigma_j) is not really latent and we have q_j = \Phi^{-1}[ \Phi( (y_j - \mu_j) / \sigma_j ) ] = (y_j - \mu_j) / \sigma_j.

The following Stan code appears to reasonably cover the truth for simulated data. It should be relatively straightforward to extend this to a regression setting. I wrote the copula density function in terms of q(u) rather than u itself since, as noted above, writing it in terms of u would result in unnecessary calculations for normal margins.

functions {
  real gaussian_copula_mvn_chol_lpdf(
    matrix Q, matrix L
  ) {
    int n = rows(Q);
    int K = cols(Q);
    matrix[K,K] Rhoinv = chol2inv(L);
    // M = (Rho^(-1) - I) * Q'Q = A * Q'Q
    matrix[K,K] M = add_diag(Rhoinv, -1.) .* crossprod(Q);
    // tr(A %*% Q'Q) = sum(A * Q'Q)
    return -n * sum(log(diagonal(L))) - 0.5 * sum(M);
  }
}
data {
  int<lower=0> n;                       // sample size
  int<lower=0> Jn;                      // number of normal margins
  int<lower=0> Jb;                      // number of binary margins
  matrix[n, Jn] yn;                     // normal rqandom variates
  array[n, Jb] int<lower=0,upper=1> yb; // binary random variates
}
transformed data {
  int J = Jn + Jb;
  matrix[n,Jb] ybmat = to_matrix(yb);
}
parameters {
  cholesky_factor_corr[J] L;
  vector[Jn] mu;
  vector<lower=0>[Jn] sigma;
  vector<lower=0,upper=1>[Jb] p;
  matrix<lower=0,upper=1>[n,Jb] u_raw;
}
model {
  //---------------------------
  // Temporary variables
  //---------------------------
  matrix[n,J] Q;    // Partially latent Q matrix (Q[, j] = Phi_inv(U[, j]))
  vector[n] Lb;     // lower bound for uniform variates
  vector[n] Ub;     // upper bound for uniform variates
  vector[n] Db;     // = Ub - Lb (useful for Jacobian)
  int k = 0;        // counter for constructing Q matrix
  //---------------------------
  // Priors
  //---------------------------
  L     ~ lkj_corr_cholesky(1.0);
  mu    ~ normal(0, 10);
  sigma ~ normal(0, 10);
  
  //---------------------------
  // Complete data likelihood
  //---------------------------
  // Normal outcomes
  for ( j in 1:Jn ) {
    k += 1;
    yn[, j] ~ normal(mu[j], sigma[j]);
    Q[, k] = inv(sigma[j]) * (yn[, j] - mu[j]);  // ~ U(0, 1)
  }
  // Binary outcomes
  for ( j in 1:Jb ) {
    k += 1;
    Lb = ybmat[, j] * (1 - p[j]);       // lower bound of latent uniform
    Ub = 1 - (1 - ybmat[, j]) * p[j];   // upper bound of latent uniform
    Db = Ub - Lb;                       // useful for jacobian adjustment
    target += log(Db);                  // Jacobian adjustment
    Q[, k] = inv_Phi(Lb + Db .* u_raw[, j]); 
  }
  // Copula density
  Q ~ gaussian_copula_mvn_chol(L);
}
generated quantities {
  matrix[J,J] Corr = multiply_lower_tri_self_transpose(L);
}

To simulate copula variables in R and test the approach, I have

## Generate copula random variates
n     <- 500
rho   <- 0.5
Jn    <- 2
Jb    <- 2
mu    <- rep(1, Jn)
sigma <- rep(1, Jn)
p     <- rep(0.3, Jb)
J     <- Jn + Jb
Rho   <- diag(1 - rho, J) + matrix(rho, J, J)
Q             <- mvtnorm::rmvnorm(n, sigma = Rho)
U             <- pnorm(Q)
Yn <- matrix(nrow = n, ncol = Jn)
Yb <- matrix(nrow = n, ncol = Jb)
k <- 0
for ( j in 1:Jn ) {
  k       <- k + 1
  Yn[, j] <- qnorm(U[, k], mean = mu[j], sd = sigma[j])
}
for ( j in 1:Jb ) {
  k <- k + 1
  Yb[, j] <- qbinom(U[, k], size = 1, prob = p[j])
}

standata <- list(
  n = n, Jn = Jn, Jb = Jb, yn = Yn, yb = Yb
)

fit <- model$sample(
  data = standata
  , iter_warmup = 1000, iter_sampling = 1000
  , chains = 9, parallel_chains = 9
)

fit$summary(variables = c('mu', 'sigma', 'p', 'Corr')) %>% print(n = 200)

This results in posterior samples that reasonably reflect the truth

   variable   mean median     sd    mad    q5   q95  rhat ess_bulk ess_tail
   <chr>     <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
 1 mu[1]     0.947  0.947 0.0451 0.0461 0.874 1.02   1.00    6727.    6624.
 2 mu[2]     0.898  0.899 0.0458 0.0466 0.823 0.974  1.00    6456.    6927.
 3 sigma[1]  1.01   1.01  0.0316 0.0313 0.956 1.06   1.00    7380.    6586.
 4 sigma[2]  1.03   1.03  0.0321 0.0325 0.980 1.09   1.00    7085.    6824.
 5 p[1]      0.259  0.259 0.0192 0.0194 0.228 0.291  1.00    8620.    6578.
 6 p[2]      0.297  0.296 0.0203 0.0202 0.264 0.331  1.00    8743.    6519.
 7 Corr[1,1] 1      1     0      0      1     1     NA         NA       NA 
 8 Corr[2,1] 0.541  0.542 0.0317 0.0323 0.489 0.592  1.00    7311.    6623.
 9 Corr[3,1] 0.468  0.470 0.0497 0.0493 0.384 0.548  1.00    3634.    5343.
10 Corr[4,1] 0.449  0.451 0.0498 0.0498 0.364 0.527  1.00    4239.    5511.
11 Corr[1,2] 0.541  0.542 0.0317 0.0323 0.489 0.592  1.00    7311.    6623.
12 Corr[2,2] 1      1     0      0      1     1     NA         NA       NA 
13 Corr[3,2] 0.479  0.480 0.0485 0.0485 0.397 0.556  1.00    3478.    4987.
14 Corr[4,2] 0.459  0.460 0.0490 0.0484 0.377 0.538  1.00    4263.    5900.
15 Corr[1,3] 0.468  0.470 0.0497 0.0493 0.384 0.548  1.00    3634.    5343.
16 Corr[2,3] 0.479  0.480 0.0485 0.0485 0.397 0.556  1.00    3478.    4987.
17 Corr[3,3] 1      1     0      0      1     1     NA         NA       NA 
18 Corr[4,3] 0.373  0.375 0.0694 0.0701 0.255 0.485  1.01    2242.    3869.
19 Corr[1,4] 0.449  0.451 0.0498 0.0498 0.364 0.527  1.00    4239.    5511.
20 Corr[2,4] 0.459  0.460 0.0490 0.0484 0.377 0.538  1.00    4263.    5900.
21 Corr[3,4] 0.373  0.375 0.0694 0.0701 0.255 0.485  1.01    2242.    3869.
22 Corr[4,4] 1      1     0      0      1     1     NA         NA       NA 
1 Like

@tarheel that’s what I’ve implemented above (you can see the current case study here: Copulas)

[quote=“andrjohns, post:5, topic:35071”]

  rtn[1][n, j] = inv_Phi(Lbound + UmL * u_raw[n, j]);

is just the mapping from a discrete cdf to a continuous one.

That was my sloppy writing and therefore not correct. What I wanted to say is:
Talk about the discrete case. inv_Phi(Lbound + UmL * u_raw[n_ j]) is just \Phi^-1(F(x-1) + f(x)*u), u \sim U[0,1] in case of x>0. For x=0 it is \Phi^-1(f(x)*u), where f(x) is the pdf of the distribution.
F(x)-F(x-1)=f(x).
Thus UmL=f(x)
And therefore:

      rtn[1][n, j] = inv_Phi(Lbound + UmL * u_raw[n, j]);
      rtn[2][n, j] = log(UmL);

Can be coded as:

      rtn[1][n, j] = inv_Phi(Lbound + exp(poisson_lpmf(y[n, j] | mu_glm_exp[n, j])) * u_raw[n, j]);
      rtn[2][n, j] = poisson_lpmf(y[n, j] | mu_glm_exp[n, j]);

We observe parallels with the continuous case:
rtn[1][n, j] refers to the inv_Phi(cdf)
and
rtn[2][n, j] refers to the log(pdf) or log(pmf).

My goal is to understand the difference between your approach and spinkney’s.
Perhaps by combining elements from both methods, we can optimize our approach.

My goal is to understand the difference between your approach and spinkney’s.

My point is that they are identical. One approach manually rescales the u variables to the appropriate bounds, and the other declares the variables with the bounds directly.

1 Like

And just to re-emphasize, the function you’re referencing is specifically returning the Jacobian adjustment as the second element, as it says in the doc:

 * @return 2D array of matrices containing the random variables
 *         and jacobian adjustments
1 Like

Exactly! Just it is corresponding to the pdf/pmf.

Exactly! Just it is corresponding to the pdf/pmf.

And because it’s a Jacobian adjustment for re-scaling a variable to specified bounds, that means it will also applied by Stan when declaring a bounded parameter.

And note that it only corresponds to the pmf in the discrete case because the parameter has already been bounded to (0,1) and the Jacobian adjustment for that applied.

1 Like

Again, the two approaches are statistically identical

1 Like

One more thing, about missing data.
Specifying them as u \sim U[0, 1] and applying \Phi^{-1} resulted in a slow sampling and large treedepth.
Instead I sampled them as std normal and added them unmodified to the correlation matrix. This gave a significant performance improvement.

This vignette gets my vote for the best vignette I’ve seen in the past few years. If you ever want to enlarge it I’d love to see an extension to a copula for three outcomes: time to event modeled with a flexible parametric proportional hazards model as in rstanarm survival functions, an ordinal outcome modeled with a proportional odds model, and a second ordinal outcome with different Y categories than the first ordinal outcome.

6 Likes

@andre.pfeuffer Could you actually please share the Stan code for this? I’m not sure what you mean by “adding them unmodified to the correlation matrix”

@Greg94 I used the Spinkey’s implementation, but it can easily adapted to Andrew’s version.
Let’s have a look at the normal marginal function first:

array[] matrix normal_marginal(matrix y, matrix mu_glm, vector sigma) {
  int N = rows(mu_glm);
  int J = cols(mu_glm);
  array[2] matrix[N, J] rtn;
  // Initialise the jacobian adjustments to zero, as vectorised lpdf will be used
  rtn[2] = rep_matrix(0, N, J);
  
  for (j in 1 : J) {
    rtn[1][ : , j] = (y[ : , j] - mu_glm[ : , j]) / sigma[j];
    rtn[2][1, j] = normal_lpdf(y[ : , j] | mu_glm[ : , j], sigma[j]);
  }
  
  return rtn;
}

Let’s say you have missing parameters. We define it as vector[Nmiss] y_miss with Nmiss the
number of missing parameters. Since sigma = 1 and mu_glm = 0 we can simply handle
missing parameters as:

...
  for (j in 1 : J) {
    rtn[1][ : , j] = y_miss[j];
    rtn[2][1, j] = std_normal_lupdf(y[j]);
  }
 }

Note that for a parameter u_{miss} \sim U[0,1] follows \Phi^-1(u_{miss}) \sim N(0,1). For the missing it doesn’t matter if the original parameter is discrete or continuous.

2 Likes