Feature request: copulas for multivariate responses of mixed types

That rocks!

@Robert check it out!

1 Like

@spinkney Thanks. That looks interesting. I look forward to more copula functions being in stan officially.

I took a stab at programming a Gaussian copula regression model with mixed margins (1 Gaussian, 1 Bernoulli). The code is based on Smith and Khaled (2012), who indicate that the joint density between \mathbf{y} and \mathbf{u} is

f(\mathbf{y}, \mathbf{u} | \mathbf{\Gamma}, \mathbf{\Theta}) = c(\mathbf{u} | \mathbf{\Gamma}) \times \prod_{j=1}^J I(a_j < u_j < b_j),

where b_j = F_j(y_j | \theta_j) and a_j = F_j(y_j^{-} | \theta_j), \theta_j's are the regression parameters for outcome j, and \mathbf{\Gamma} is a correlation matrix. The function c(\mathbf{u} | \mathbf{\Gamma}) is the Gaussian copula density function given by

\begin{align} c(\mathbf{u} | \mathbf{\Gamma}) &\propto \lvert \mathbf{\Gamma} \rvert^{-1/2} \exp\left\{ -\frac{1}{2} \Phi^{-1}(\mathbf{u}) ' (\mathbf{\Gamma}^{-1} - \mathbf{I}) \Phi^{-1}(\mathbf{u}) \right\}. \end{align}

Suppose we have n observations from the copula and let \mathbf{z}_i = \Phi^{-1}(\mathbf{u}_i). Then

\begin{align} c(\mathbf{u}_1, \ldots, \mathbf{u}_n | \mathbf{\Gamma}) \propto \lvert \mathbf{\Gamma} \rvert^{-n/2} \exp\left\{ -\frac{1}{2} \text{tr}\left( (\mathbf{\Gamma}^{-1} - \mathbf{I}) \mathbf{Z}'\mathbf{Z} \right) \right\}, \end{align}

where \mathbf{Z} = (\mathbf{z}_1, \ldots, \mathbf{z}_n)'. This function is coded as gaussian_copula_lp in the Stan code below.

The function get_bounds_bernoulli returns a n \times 2 matrix giving a_{ij} and b_{ij}, but I truncated the values (minval and maxval) so that \Phi^{-1}(u) would not result in \pm \infty. Not sure if this maybe is why the correlation is off, but I don’t think there’s any way around it.

functions{
  real gaussian_copula_lp(matrix U, matrix Gamma) {
      int n = rows(U);
      target += -0.5 * ( n * log_determinant(Gamma) + sum( add_diag(inverse_spd(Gamma), -1.0) .* crossprod( inv_Phi(U) ) ) );
      return target();
  }
  
  matrix get_bounds_bernoulli(int[] y, vector eta, real minval, real maxval) {
    int n = size(y);
    matrix[n,2] bounds;
    vector[n] mu;
    
    // get mean
    mu = inv_logit(eta);     // assume logistic link for time being
    
    // fill in bounds; first is invNormCDF(y-1), second is invNormCDF(y)
    for ( i in 1:n ) {
      if ( y[i] == 0 ) {
        bounds[i,1] = minval;
        bounds[i,2] = bernoulli_cdf( y[i], mu[i] );
      } else {
        bounds[i,1] = bernoulli_cdf(y[i]-1, mu[i]);
        bounds[i,2] = maxval;
      }
    }
    return bounds;
  }
}

data {
  int<lower=0>                  N;
  int<lower=0>                  pnormal;
  int<lower=1>                  pbernoulli;
  real                          ynormal[N];
  int<lower=0,upper=1>          ybernoulli[N];
  matrix[N,pnormal]             Xnormal;
  matrix[N,pbernoulli]          Xbernoulli;
  vector[pnormal]               beta0normal;
  matrix[pnormal, pnormal]      Sigma0normal;
  real<lower=0>                 tau_shape;
  real<lower=0>                 tau_rate;
  vector[pbernoulli]            beta0bernoulli;
  matrix[pbernoulli,pbernoulli] Sigma0bernoulli;
}


parameters {
  vector[pnormal]         beta1;     // reg coefs for gaussian variable
  real<lower=0>           tau;       // inverse variance for gaussian var
  vector[pbernoulli]      beta2;     // reg coefs for bernoulli variable
  real<lower=0,upper=1>   u2free[N]; // latent variable for bernoulli variable (not fully constrainted)
  cholesky_factor_corr[2] L;         // Cholesky decomposition of 2x2 correlation matrix
}

transformed parameters{
  real         u2[N];
  vector[N]    eta2;
  matrix[N, 2] bounds;

  // obtain bounds
  eta2   = Xbernoulli * beta2;                                        // linear predictor for bernoulli outcome
  bounds = get_bounds_bernoulli(ybernoulli, eta2, 1.0e-7, 1-1.0e-7);   // N x 2 matrix of bounds for u2

  // bound the u2 parameter
  for ( i in 1:N ) {
    u2[i] = bounds[i,1] + (bounds[i,2] - bounds[i,1]) * u2free[i];
  }
}


model {
  
  real sigma = sqrt(inv(tau));
  vector[N] eta1 = Xnormal * beta1;
  matrix[N, 2] U;
  matrix[2,2] Gamma;
  
  // u1 is deterministic since it is continuous
  for ( i in 1:N ) {
    U[i,1] = normal_cdf(ynormal[i], eta1[i], sigma);
    U[i,2] = u2[i];
  }
  
  // obtain correlation matrix of (u1, u2)
  Gamma = tcrossprod(L);
  
  // priors
  beta1 ~ multi_normal(beta0normal,    Sigma0normal);
  beta2 ~ multi_normal(beta0bernoulli, Sigma0bernoulli);
  tau   ~ gamma(tau_shape, tau_rate);
  L     ~ lkj_corr_cholesky(1.0);                     // uniform prior
  
  // add normal likelihood for ynormal
  target += normal_lpdf(ynormal | eta1, sigma);
  
  // add on copula likelihood
  target += gaussian_copula_lp(U, Gamma);
}

The R code utilized to simulate the data is

n     = 100
Gamma = cbind(c(1, .5), c(.5, 1))

## create uniform random variables via Gaussian copula
U     = pnorm( mvtnorm::rmvnorm(n, sigma = Gamma) )
X     = cbind(1, rnorm(n, mean = 5, sd = 5/3))
beta  = c(-5, 1)
eta   = X %*% beta
mu1   = eta
mu2   = binomial()$linkinv(eta)
y1    = qnorm(U[, 1], mean = mu1, sd = 2)
y2    = qbinom(U[, 2], size = 1, prob = mu2)
data  = data.frame(y1 = y1, y2 = y2, x = X[, 2])

The regression coefficients looked pretty good, but the correlation was pretty off (posterior mean = 0.82 based on effective sample size of ~500).

Curious if anyone has any ideas if there’s something going awry in my logic.

4 Likes

As an update, if I set minval = 1.0e-100 and maxval = 1.0 - 1.0e-100, I got a divergent transition.

Let me know if anyone spots any obvious reasons why the correlation seems off. I am wondering if it is because of the indicator function being absent in the target probability:

\prod_{i=1}^n I(a_{i2} \le u_{i2} \le b_{i2})

I thought that by setting the varying bounds of u2 it would solve this implicitly.

2 Likes

I don’t understand your model or any of those papers :D so I just did what I thought made sense (caveat emptor, this may not generally work either) and I’m able to recover the covariance and parameters for this specific simulation. I simplified the problem by removing the covariates. I’m interested if it works well or has problems in general.

First the simulation. I’m creating a continuous RV from a standard normal and a binary RV with probability of success of 0.6. The correlation coefficient is \rho = 0.8.

r = 0.8 # correlation coefficient
sigma = matrix(c(1,r,r,1), ncol=2)
s = chol(sigma)
n <- 100
z = s %*% matrix(rnorm(n * 2), nrow = 2 )
u = pnorm(z)
success = 1 * (u[2,] > 0.4)

stan_dat <- list(
  N = n,
  y1 = data$y2,
  y2 = data$y1
)

mod_mix <- mod$sample(
  data = stan_dat,
  parallel_chains = 4,
  chains = 4,
  iter_warmup = 500,
  iter_sampling = 1000,
  adapt_delta = 0.8,
  max_treedepth = 10
  )
   

For the stan code I think you should check out my helpful functions repo and use that multivariate normal copula as it will be much more preformant than the naive way. You can see the derivation in the docs of that repo. Anyway, I create a continuous latent parameter that governs how the discrete outcome is generated. It’s a pseudo-regression where the continuous latent parameter is on the real line then transformed via the link function (inv_logit).

functions {
  real multi_normal_copula_lpdf(matrix u, matrix L){
   int K = rows(u);
   int N = cols(u);
   real inv_sqrt_det_log = sum(log(diagonal(L)));
   matrix[K, N] x = mdivide_left_tri_low(L, u);

   return -N * inv_sqrt_det_log - 0.5 * sum(columns_dot_self(x) - columns_dot_self(u));
}
}
data {
  int<lower=0> N;
  int y1[N];
  vector[N] y2;
}
parameters {
  vector[N] u;
  cholesky_factor_corr[K] L;
} 
transformed parameters {
  vector[N] mu = inv_logit(u);
}
model {
  y1 ~ bernoulli(mu);
  y2 ~ std_normal();
  u ~ std_normal();

  {
    matrix[K, N] Y;

    for (n in 1:N){
      Y[1, n] = inv_Phi(normal_cdf(u[n], 0, 1));
      Y[2, n] = inv_Phi(normal_cdf(y2[n], 0, 1));
    }
 
    Y ~ multi_normal_copula(L);
  }
}
generated quantities {
  matrix[K, K] Sigma = multiply_lower_tri_self_transpose(L);
}

The covariance matrix looks like

# A tibble: 4 x 10
  variable    mean median    sd    mad    q5   q95  rhat ess_bulk ess_tail
  <chr>      <dbl>  <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 Sigma[1,1] 1      1     0     0      1     1     NA         NA       NA 
2 Sigma[2,1] 0.836  0.855 0.102 0.0957 0.638 0.964  1.03     168.     277.
3 Sigma[1,2] 0.836  0.855 0.102 0.0957 0.638 0.964  1.03     168.     277.
4 Sigma[2,2] 1      1     0     0      1     1     NA         NA       NA 
3 Likes

I think this may work for the normal-Bernoulli case (is this the Chib representation?), but I’m not sure this works in general.

Seeking a general solution, I wonder if we can utilize the multivariate truncated normal code of @bgoodri.

Suppose we have J-dimensional vector of which the first J_1 are from continuous densities and the next J_2 = J - J_1 are from discrete mass functions. Write the “latent” variables as \mathbf{u} = (\mathbf{u}_1', \mathbf{u}_2')' (of course u_{1j} = F(y_{1j} | \theta_j) is not latent given \mathbf{y}_1) Then we may write the joint likelihood of the observed and latent variables as

c(\mathbf{y}, \mathbf{u}_1, \mathbf{u}_2 | \mathbf{\Gamma}) = c(\mathbf{u}_1 | \mathbf{\Gamma}_{11}) \prod_{j=1}^{J_1} f_j(y_j | \theta_j) \times c(u_2 | u_1) \prod_{j=J_1 + 1}^{J} I(a_{2j} < u_{2j} < b_{2j})

where c(u_2 | u_1) is the conditional Gaussian copula density function, a_{2j} = F(y_{2j} - 1 | \theta_{2j}), and b_{2j} = F(y_{2j} | \theta_{2j}). The key question basically is how can we implement this likelihood in Stan? Note that in binomial models, either a_{2j} = 0 or b_{2j} = 1, so that, either \Phi^{-1}(a_{2j}) = -\infty or \Phi^{-1}(b_{2j}) = \infty.

I won’t go into the details, but one can show that if you consider the transformation z_{kj} = \Phi^{-1}(u_{kj}) for k = 1, 2, j = 1, \ldots, J_k, we can write

f(\mathbf{y}, \mathbf{z}_1, \mathbf{z}_2) = \phi_{J_1}(\mathbf{z}_1 | \mathbf{0}, \mathbf{\Gamma}_{11}) \times \left[ \prod_{j=1}^{J_1} f_j(y_j | \theta_j) \right] \times \left[ \phi_{J_2}(\mathbf{z}_2 | \Gamma_{21} \Gamma_{11}^{-1} z_1, \Gamma_{22} - \Gamma_{21} \Gamma_{11}^{-1} \Gamma_{21}) \times \prod_{j=J_1 + 1}^J I(\Phi^{-1}(a_{2j} )< z_{2j} < \Phi^{-1}(b_{2j})) \right]

where I used the partition

\Gamma = \begin{pmatrix} \Gamma_{11} & \Gamma_{12} \\ \Gamma_{21} & \Gamma_{22} \end{pmatrix}

Thus, the likelihood can be decomposed as a (untruncated) MVN (which is easy), a product of known density functions (again, easy), and a truncated multivariate normal density, where the bounds are changing at every iteration. In Ben’s code, he mostly does the development when there is either an upper or lower bound (although he does offer a note about when there’s both an upper and lower bound). The general solution for a copula would be to allow for both an upper and lower bound. For example, if we have a Poisson variable and the observed value is nonzero, there would be both an upper and a lower bound on the latent variable.

I think the big challenge is how to handle the varying bounds at each iteration and implement that into the multivariate truncated normal.

4 Likes

Wanted to suggest something like that. Ben’s code worked pretty well for multivariate ordinal models (where you have both bounds), so it could - at least in principle work for others as well.

2 Likes

Thank you for this clear exposition. My background on copulas is recent, just learned about them a few months ago. I’m not well versed on the literature of mixing discrete with continuous outcomes so not sure if my approach above is even valid.

I’m interested if you can get the tMVN from Ben to work, as I think lots of other people are too.

1 Like

I didn’t want you to go empty handed so here’s the tMVN that does both bounds. I modified the signature to take in a vector of lower and upper bounds. The input s is now just 1 if there are bounds and 0 if not. You put in -Inf and +Inf if there are no bounds.

functions {
  vector[] make_stuff(vector mu, matrix L, vector lb, vector ub, vector s, vector u) {
    int K = rows(mu); 
    vector[K] d; 
    vector[K] z; 
    vector[K] out[2];
    
    for (k in 1:K) {
      int km1 = k - 1;
      if (s[k] != 0) {
        vector[2] z_star = ([lb[k], ub[k]]' - (mu[k] + ((k > 1) ? L[k, 1:km1] * head(z, km1) : 0))) / L[k, k];
        vector[2] u_star = Phi(z_star);
        
        real v = u_star[1] + (u_star[2] - u_star[1]) * u[k];

        d[k] = u_star[2] - u_star[1];
      
        z[k] = inv_Phi(v);
      }
      else {
        z[k] = inv_Phi(u[k]);
        d[k] = 1;
      }
    }
    out[1] = z;
    out[2] = d;
    return out;
  }
}
data {
  int<lower=2> K;             // number of dimensions
  vector[K] lb;                // lower bound
  vector[K] ub;
  
  // s[k] ==  0 implies no constraint; otherwise 
  // s[k] == -1 -> b[k] is an upper bound 
  // s[k] == +1 -> b[k] is a lower bound
  vector<lower=0, upper=1>[K] s;
  
  vector[K] mu;
  cholesky_factor_cov[K,K] L;
}
parameters {
  vector<lower=0,upper=1>[K] u;
}
model {
  target += log(make_stuff(mu, L, lb, ub, s, u)[2]); // Jacobian adjustments
  // implicit: u ~ uniform(0,1)
}
generated quantities {
  vector[K] y = mu + L * make_stuff(mu, L, lb, ub, s, u)[1];
}
library(rstan)

expose_stan_functions("tMVN_both.stan")

K <- 2
rho <- 0.5
Sigma <- matrix(c(1,rho,rho,1), K, K)
lb <- c(1, 1.5)
ub <- c(1.3, 1.9)
mu <- c(0, 0)
L <- t(chol(Sigma))
s <- c(1, 1)
u <- runif(2)
make_stuff(mu, L, lb, ub, s, u)
### returns
> make_stuff(mu, L, lb, ub, s, u)
[[1]]
[1] 1.012789 1.395718

[[2]]
[1] 0.06185477 0.07183871

### if you have no bounds put in Inf like in 
lb <- c(-Inf, 1.5)
ub <- c(1.3, 1.9)
> make_stuff(mu, L, lb, ub, s, u)
[[1]]
[1] -0.6108438  2.2556763

[[2]]
[1] 0.90319952 0.01310842

Note

With certain correlations, certain “unlikely” bounds are not respected. Not sure if that’s expected or not. For eg, the above with -Inf will sometimes give answers that are > 1.9 for the second variate (as in the 2.25 you see). I thought this was an error but I’m not sure. It may be that the correlation and bounds are invalid together. Something to potentially investigate more.

When there is no correlation this pathology doesn’t appear to exist

> out <- list()
> for (i in 1:10) {
+   u <- runif(2)
+   out[[i]] <- make_stuff(mu, L, lb, ub, s, u)[[1]]
+ }
> do.call(rbind, out)
             [,1]     [,2]
 [1,] -1.35995114 1.778739
 [2,] -2.04867457 1.618372
 [3,] -0.25643875 1.516963
 [4,] -0.09994477 1.584373
 [5,] -1.54324832 1.731444
 [6,]  1.07798076 1.726483
 [7,]  0.57358502 1.567540
 [8,] -1.10508155 1.781897
 [9,]  0.01867937 1.707072
[10,] -0.29349804 1.868850

If the correlation is set to 0.05, I get 52/1000 that are over 1.9. If \rho = 0.5 then 572/1000 are over 1.9.

2 Likes

Thanks! In the interim, I did something similar to allow both bounds. In this code, I was able to extract a reasonably close value for the posterior mean of the correlation, and the posterior mean / variance of the regression coefficients matched up exactly with the MLEs

It’s probably not the most optimal way to do it (I’m decent at programming, but definitely more on the statistics side of things), but it appears to work. If there’s a better way to optimize it, please let me know.

functions{
  //-----------------------------------------------------------------
    // function to increment log target probability for truncated MVN
  //     * @param u   K-dimensional vector in (0,1)^K
  //     * @param mu  marginal mean of truncated MVN
  //     * @param L   lower cholesky factor of covariance matrix
  //     * @param lb  lower bounds of u's (ignored if lb_ind == 0)
  //     * @param ub  upper bounds of u's (ignored if ub_ind == 0)
  //
    //      * @return K-dim vector giving log target increment
  //-----------------------------------------------------------------
    vector multi_normal_truncated(vector u, vector mu, matrix L, vector lb, vector ub, vector lb_ind, vector ub_ind) {
      int K = rows(u); 
      vector[K] d; 
      vector[K] z;
      // vector[K] out[2];
      
      for ( k in 1:K ) {
        // if unbounded
        if ( lb_ind[k] == 0 && ub_ind[k] == 0 ) {  
          z[k] = inv_Phi(u[k]);
          d[k] = 1;
        }
        else { 
          int km1 = k-1;
          real v; 
          real z_star;
          
          // lower bound, no upper bound
          if ( lb_ind[k] == 1 && ub_ind[k] == 0 ) {
            real u_star;
            z_star = (lb[k] - (mu[k] + ((k > 1) ? L[k,1:km1] * head(z, km1) : 0) ) ) / L[k,k];
            u_star = Phi(z_star);
            d[k]   = 1 - u_star;
            v      = u_star + d[k] * u[k];                 // v ~ U(u_star, 1)
          }
          // upper bound, no lower bound
          else if ( lb_ind[k] == 0 && ub_ind[k] == 1 ) {  // upper bound, no lower bound
            real u_star;
            z_star = (ub[k] - (mu[k] + ((k > 1) ? L[k,1:km1] * head(z, km1) : 0) ) ) / L[k,k];
            u_star = Phi(z_star);
            v      = u_star * u[k];                        // v ~ U(0, u_star)
            d[k]   = u_star;
          }
          // both lower and upper bounds
          else {
            // compute u_star for upper and lower bounds
            real u_star_lb;
            real u_star_ub;
            u_star_lb = Phi( (lb[k] - (mu[k] + ((k > 1) ? L[k,1:km1] * head(z, km1) : 0) ) ) / L[k,k] );
            u_star_ub = Phi( (ub[k] - (mu[k] + ((k > 1) ? L[k,1:km1] * head(z, km1) : 0) ) ) / L[k,k] );
            
            d[k] = (u_star_ub - u_star_lb);
            v    = u_star_lb + (u_star_ub - u_star_lb) * u[k];    // v ~ U(u_star_lb, u_star_ub)
          }
          z[k] = inv_Phi(v);
        }
      }
      return log(d);
    }
  //-----------------------------------------------------------------
    // function to CHECK whether bernoulli var has upper/lower bound
  //     * @param y   N-dim integer array of 0's and 1's
  //
    //      * @return Nx2 matrix giving indicator of bounds:
    //      *   1st column is indicator of lb, 2nd is indicator of ub
  //-----------------------------------------------------------------
    matrix bounds_indicator_bernoulli(int[] y) {
      int n = size(y);
      matrix[n,2] bound_ind;
      for ( i in 1:n ) {
        // if success, there is a lower bound but no upper bound
        if (y[i] == 1) {
          bound_ind[i,1] = 1;
          bound_ind[i,2] = 0;
        }
        // if failure, there is upper bound but no lower bound
        else {
          bound_ind[i,1] = 0;
          bound_ind[i,2] = 1;
        }
      }
      return bound_ind;
    }
  //------------------------------------------------------------------------
    // function to OBTAIN bounds of bernoulli variable given linear predictor
  //     * @param y   N-dim integer array of 0's and 1's
  //     * @param eta N-dim vector giving linear predictor
  //
    //      * @return Nx2 matrix giving bounds:
    //      *   1st column is lower bound, 2nd is upper bound
  //------------------------------------------------------------------------
    matrix bounds_bernoulli(int[] y, vector eta) {
      int n = size(y);
      matrix[n,2] bounds;
      vector[n] mu;
      bounds = rep_matrix(0, n, 2);
      
      // get mean
      mu = inv_logit(eta);     // assume logistic link for time being
      
      // fill in bounds
      for ( i in 1:n ) {
        // if y = 0, there is an upper bound and no lower bound
        if ( y[i] == 0 ) {
          bounds[i,2] = inv_Phi( bernoulli_cdf(0, mu[i]) );
        } 
        // if y = 1, there is a lower bound and no upper bound
        else {
          bounds[i,1] = inv_Phi( bernoulli_cdf(0, mu[i]) );
        }
      }
      return bounds;
    }
}

data {
  int<lower=0>                  N;
  int<lower=0>                  pnormal;
  int<lower=1>                  pbernoulli;
  real                          ynormal[N];
  int<lower=0,upper=1>          ybernoulli[N];
  matrix[N,pnormal]             Xnormal;
  matrix[N,pbernoulli]          Xbernoulli;
  vector[pnormal]               beta0normal;
  matrix[pnormal, pnormal]      Sigma0normal;
  real<lower=0>                 tau_shape;
  real<lower=0>                 tau_rate;
  vector[pbernoulli]            beta0bernoulli;
  matrix[pbernoulli,pbernoulli] Sigma0bernoulli;
}

transformed data {
  // obtain matrix giving whether ybernoulli has upper/lower bound
  matrix[N,2] u2_bounds_ind;
  vector[2] zeros = rep_vector(0, 2);
  vector[2] mu    = zeros;
  u2_bounds_ind = bounds_indicator_bernoulli(ybernoulli);
}

parameters {
  vector[pnormal]         beta1;     // reg coefs for gaussian variable
  real<lower=0>           tau;       // inverse variance for gaussian var
  vector[pbernoulli]      beta2;     // reg coefs for bernoulli variable
  real<lower=0,upper=1>   u2[N];     // latent variable for bernoulli
  cholesky_factor_corr[2] L;         // Cholesky decomposition of 2x2 correlation matrix
}

model {
  
  real sigma     = sqrt(inv(tau));         // standard deviation for normal variable
  vector[N] eta1 = Xnormal    * beta1;     // linear predictor for normal variable
  vector[N] eta2 = Xbernoulli * beta2;     // linear predictor for bernoulli variable
  matrix[N, 2] u2bounds;                   // matrix storing bounds for uniform generated variable for bernoulli
  vector[2] lb;                            // lower bound for u's
  vector[2] ub;                            // upper bound for u's
  vector[2] lb_ind;                        // indicates if there is a  lower bound (first element 0 since continuous)
  vector[2] ub_ind;                        // indicates if there is an upper bound (first element 0 since continuous)
  
  // initialize vectors with 0s
  lb     = zeros;               // lower bound of u's (ignored if lb_ind == 0)
  ub     = zeros;               // upper bound of u's (ignored if ub_ind == 0)
  lb_ind = zeros;               // indicates if there is a  lower bound for u's
  ub_ind = zeros;               // indicates if there is an upper bound for u's
  
  // get bounds for u2
  u2bounds = bounds_bernoulli(ybernoulli, eta2);
  
  // priors
  beta1 ~ multi_normal(beta0normal,    Sigma0normal);
  beta2 ~ multi_normal(beta0bernoulli, Sigma0bernoulli);
  tau   ~ gamma(tau_shape, tau_rate);
  L     ~ lkj_corr_cholesky(1.0);                     // uniform prior
  
  ynormal ~ normal(Xnormal * beta1, sigma);
  
  for ( i in 1:N ) {
    vector[2] u;
    u[1]      = Phi_approx((ynormal[i] - eta1[i]) / sigma);
    u[2]      = u2[i];
    lb[2]     = u2bounds[i,1];
    ub[2]     = u2bounds[i,2];
    lb_ind[2] = u2_bounds_ind[i,1];
    ub_ind[2] = u2_bounds_ind[i,2];
    
    target += multi_normal_truncated(
      u, mu, L, lb, ub, lb_ind, ub_ind
    );
  }
}


generated quantities {
  corr_matrix[2] Gamma;
  Gamma = tcrossprod(L);
}
3 Likes

And your code has the same problem as mine

K <- 2
rho <- -.5
Sigma <- matrix(c(1,rho,rho,1), K, K)
lb <- c(1.5, 1.5)
ub <- c(1.3, 1.9)
mu <- c(0, 0)
L <- t(chol(Sigma))
s <- c(1, 1)
out <- list()
lb_ind <- c(1, 1)
ub_ind <- c(1, 1)
multi_normal_truncated(mu, L, lb, ub, lb_ind, ub_ind, u)
[1] 1.429983 2.750805
1 Like

I’m AFK at the moment but it looks like your lower bound for the first element is larger than your upper bound. Does that change anything?

No, it does not but you’d think it would! Anyway, tested some more and our code is equivalent.

Edit: See the following post that you cannot just use expose_stan_functions to generate random variates as this does not take the jacobian into account. These plots just show what happens when there is no jacobian!

K <- 2
rho <- -.5
Sigma <- matrix(c(1,rho,rho,1), K, K)
lb <- c(1, 1.5)
ub <- c(1.3, 1.9)
mu <- c(0, 0)
L <- t(chol(Sigma))
s <- c(1, 1)
out <- list()
lb_ind <- c(1, 1)
ub_ind <- c(1, 1)

out_sp <- list()
out_tar <- list()
for (i in 1:10000) {
 u <- runif(2)
 out_sp[[i]] <- make_stuff(mu, L, lb, ub, s, u)[[1]]
 out_tar[[i]] <- multi_normal_truncated(mu, L, lb, ub, lb_ind, ub_ind, u)
}
out_sp <- do.call(rbind, out_sp)
out_tar <- do.call(rbind, out_tar)

which(out_sp != out_tar)
> integer(0)

The interesting part is when I plot the output. It should be in a box bounded by those lower and upper bounds, however…

If instead I set the correlation to 0 it satisfies those bounds

Plotting the full MVN with rho = -0.5 and the box of bounds does show density there

Now, if I remove the lower bound (which indeed your code and mine are equivalent which I checked again - though with mine you can remove the extra *_ind things and shorten the code, but I digress). Here’s the plot of the output that shows the grayed out portion which violates the bounds:


I guess “there’s something strange in the neighborhood…”
Once again the 0 corr plot satisfies those bounds so it’s an issue with the correlation adjustment

2 Likes

To add insult to injury for that first plot, the correlation appears to be positive and not negative.

Not only does it not obey the bounds, it doesn’t obey the correlation.

1 Like

Yea, but the key missing component is that the jacobian updates the space. So what I showed above is without the jacobian adjustment. Indeed when I sample with the adjustment things turn out nicely.

Ooh, multivariate stats is beautiful (rho = -0.5)

And rho = -0.8

3 Likes

@tarheel do you have the derivation of your formula above or a paper to point to? That may also have the general case when you have J discrete and K continuous margins?

The derivation is from that Smith and Khaled (2012) paper I pointed to earlier. Check out Section 6.1 and proposition 3, in particular you can combine equations (6.1) and the representation of f(u_D | x) in bullet point (i) in that proposition to get the augmented likelihood.

Also be sure to check out Equation (2.3) (for the fully discrete setting) as that sets the basis for the mixed setting.

The difference between this and their paper is they’re focused on Gibbs-like sampling, so they work out the full conditionals and stuff. Luckily, that doesn’t need to be done (a distinct advantage of Stan for copulas–if the likelihood can be implemented).

The copula is decomposed there as
c(u_C, u_D) = c(u_C) c(u_D | u_C)

but there’s no reason to decompose in general. We just need the joint likelihood!

4 Likes

@tarheel did most of the heavy lifting. I just packaged it up with a bow. Here’s the discrete-continuous multi-variate gaussian case. It uses the truncated MVN to derive the bounds from the latent continuous margins needed for the discrete margins.

It’s in PR status but you can check it out at
added mixed_copula by spinkney · Pull Request #13 · spinkney/helpful_stan_functions (github.com)

You should be able to add gaussian, bernoulli, and poisson or leave any one of those out and the code should run. If you want to add different margins it’s a bit of work but do-able. Ideally there’d be a functor that could take in the margin distribution and derive outcome. Until then you’ll have to add it - pr’s are welcome in the helpful functions repo :)

4 Likes