That rocks!
@Robert check it out!
@spinkney Thanks. That looks interesting. I look forward to more copula functions being in stan officially.
I took a stab at programming a Gaussian copula regression model with mixed margins (1 Gaussian, 1 Bernoulli). The code is based on Smith and Khaled (2012), who indicate that the joint density between \mathbf{y} and \mathbf{u} is
where b_j = F_j(y_j | \theta_j) and a_j = F_j(y_j^{-} | \theta_j), \theta_j's are the regression parameters for outcome j, and \mathbf{\Gamma} is a correlation matrix. The function c(\mathbf{u} | \mathbf{\Gamma}) is the Gaussian copula density function given by
Suppose we have n observations from the copula and let \mathbf{z}_i = \Phi^{-1}(\mathbf{u}_i). Then
where \mathbf{Z} = (\mathbf{z}_1, \ldots, \mathbf{z}_n)'. This function is coded as gaussian_copula_lp
in the Stan code below.
The function get_bounds_bernoulli
returns a n \times 2 matrix giving a_{ij} and b_{ij}, but I truncated the values (minval
and maxval
) so that \Phi^{-1}(u) would not result in \pm \infty. Not sure if this maybe is why the correlation is off, but I donāt think thereās any way around it.
functions{
real gaussian_copula_lp(matrix U, matrix Gamma) {
int n = rows(U);
target += -0.5 * ( n * log_determinant(Gamma) + sum( add_diag(inverse_spd(Gamma), -1.0) .* crossprod( inv_Phi(U) ) ) );
return target();
}
matrix get_bounds_bernoulli(int[] y, vector eta, real minval, real maxval) {
int n = size(y);
matrix[n,2] bounds;
vector[n] mu;
// get mean
mu = inv_logit(eta); // assume logistic link for time being
// fill in bounds; first is invNormCDF(y-1), second is invNormCDF(y)
for ( i in 1:n ) {
if ( y[i] == 0 ) {
bounds[i,1] = minval;
bounds[i,2] = bernoulli_cdf( y[i], mu[i] );
} else {
bounds[i,1] = bernoulli_cdf(y[i]-1, mu[i]);
bounds[i,2] = maxval;
}
}
return bounds;
}
}
data {
int<lower=0> N;
int<lower=0> pnormal;
int<lower=1> pbernoulli;
real ynormal[N];
int<lower=0,upper=1> ybernoulli[N];
matrix[N,pnormal] Xnormal;
matrix[N,pbernoulli] Xbernoulli;
vector[pnormal] beta0normal;
matrix[pnormal, pnormal] Sigma0normal;
real<lower=0> tau_shape;
real<lower=0> tau_rate;
vector[pbernoulli] beta0bernoulli;
matrix[pbernoulli,pbernoulli] Sigma0bernoulli;
}
parameters {
vector[pnormal] beta1; // reg coefs for gaussian variable
real<lower=0> tau; // inverse variance for gaussian var
vector[pbernoulli] beta2; // reg coefs for bernoulli variable
real<lower=0,upper=1> u2free[N]; // latent variable for bernoulli variable (not fully constrainted)
cholesky_factor_corr[2] L; // Cholesky decomposition of 2x2 correlation matrix
}
transformed parameters{
real u2[N];
vector[N] eta2;
matrix[N, 2] bounds;
// obtain bounds
eta2 = Xbernoulli * beta2; // linear predictor for bernoulli outcome
bounds = get_bounds_bernoulli(ybernoulli, eta2, 1.0e-7, 1-1.0e-7); // N x 2 matrix of bounds for u2
// bound the u2 parameter
for ( i in 1:N ) {
u2[i] = bounds[i,1] + (bounds[i,2] - bounds[i,1]) * u2free[i];
}
}
model {
real sigma = sqrt(inv(tau));
vector[N] eta1 = Xnormal * beta1;
matrix[N, 2] U;
matrix[2,2] Gamma;
// u1 is deterministic since it is continuous
for ( i in 1:N ) {
U[i,1] = normal_cdf(ynormal[i], eta1[i], sigma);
U[i,2] = u2[i];
}
// obtain correlation matrix of (u1, u2)
Gamma = tcrossprod(L);
// priors
beta1 ~ multi_normal(beta0normal, Sigma0normal);
beta2 ~ multi_normal(beta0bernoulli, Sigma0bernoulli);
tau ~ gamma(tau_shape, tau_rate);
L ~ lkj_corr_cholesky(1.0); // uniform prior
// add normal likelihood for ynormal
target += normal_lpdf(ynormal | eta1, sigma);
// add on copula likelihood
target += gaussian_copula_lp(U, Gamma);
}
The R code utilized to simulate the data is
n = 100
Gamma = cbind(c(1, .5), c(.5, 1))
## create uniform random variables via Gaussian copula
U = pnorm( mvtnorm::rmvnorm(n, sigma = Gamma) )
X = cbind(1, rnorm(n, mean = 5, sd = 5/3))
beta = c(-5, 1)
eta = X %*% beta
mu1 = eta
mu2 = binomial()$linkinv(eta)
y1 = qnorm(U[, 1], mean = mu1, sd = 2)
y2 = qbinom(U[, 2], size = 1, prob = mu2)
data = data.frame(y1 = y1, y2 = y2, x = X[, 2])
The regression coefficients looked pretty good, but the correlation was pretty off (posterior mean = 0.82 based on effective sample size of ~500).
Curious if anyone has any ideas if thereās something going awry in my logic.
As an update, if I set minval = 1.0e-100
and maxval = 1.0 - 1.0e-100
, I got a divergent transition.
Let me know if anyone spots any obvious reasons why the correlation seems off. I am wondering if it is because of the indicator function being absent in the target probability:
I thought that by setting the varying bounds of u2
it would solve this implicitly.
I donāt understand your model or any of those papers :D so I just did what I thought made sense (caveat emptor, this may not generally work either) and Iām able to recover the covariance and parameters for this specific simulation. I simplified the problem by removing the covariates. Iām interested if it works well or has problems in general.
First the simulation. Iām creating a continuous RV from a standard normal and a binary RV with probability of success of 0.6. The correlation coefficient is \rho = 0.8.
r = 0.8 # correlation coefficient
sigma = matrix(c(1,r,r,1), ncol=2)
s = chol(sigma)
n <- 100
z = s %*% matrix(rnorm(n * 2), nrow = 2 )
u = pnorm(z)
success = 1 * (u[2,] > 0.4)
stan_dat <- list(
N = n,
y1 = data$y2,
y2 = data$y1
)
mod_mix <- mod$sample(
data = stan_dat,
parallel_chains = 4,
chains = 4,
iter_warmup = 500,
iter_sampling = 1000,
adapt_delta = 0.8,
max_treedepth = 10
)
For the stan code I think you should check out my helpful functions repo and use that multivariate normal copula as it will be much more preformant than the naive way. You can see the derivation in the docs of that repo. Anyway, I create a continuous latent parameter that governs how the discrete outcome is generated. Itās a pseudo-regression where the continuous latent parameter is on the real line then transformed via the link function (inv_logit).
functions {
real multi_normal_copula_lpdf(matrix u, matrix L){
int K = rows(u);
int N = cols(u);
real inv_sqrt_det_log = sum(log(diagonal(L)));
matrix[K, N] x = mdivide_left_tri_low(L, u);
return -N * inv_sqrt_det_log - 0.5 * sum(columns_dot_self(x) - columns_dot_self(u));
}
}
data {
int<lower=0> N;
int y1[N];
vector[N] y2;
}
parameters {
vector[N] u;
cholesky_factor_corr[K] L;
}
transformed parameters {
vector[N] mu = inv_logit(u);
}
model {
y1 ~ bernoulli(mu);
y2 ~ std_normal();
u ~ std_normal();
{
matrix[K, N] Y;
for (n in 1:N){
Y[1, n] = inv_Phi(normal_cdf(u[n], 0, 1));
Y[2, n] = inv_Phi(normal_cdf(y2[n], 0, 1));
}
Y ~ multi_normal_copula(L);
}
}
generated quantities {
matrix[K, K] Sigma = multiply_lower_tri_self_transpose(L);
}
The covariance matrix looks like
# A tibble: 4 x 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Sigma[1,1] 1 1 0 0 1 1 NA NA NA
2 Sigma[2,1] 0.836 0.855 0.102 0.0957 0.638 0.964 1.03 168. 277.
3 Sigma[1,2] 0.836 0.855 0.102 0.0957 0.638 0.964 1.03 168. 277.
4 Sigma[2,2] 1 1 0 0 1 1 NA NA NA
I think this may work for the normal-Bernoulli case (is this the Chib representation?), but Iām not sure this works in general.
Seeking a general solution, I wonder if we can utilize the multivariate truncated normal code of @bgoodri.
Suppose we have J-dimensional vector of which the first J_1 are from continuous densities and the next J_2 = J - J_1 are from discrete mass functions. Write the ālatentā variables as \mathbf{u} = (\mathbf{u}_1', \mathbf{u}_2')' (of course u_{1j} = F(y_{1j} | \theta_j) is not latent given \mathbf{y}_1) Then we may write the joint likelihood of the observed and latent variables as
where c(u_2 | u_1) is the conditional Gaussian copula density function, a_{2j} = F(y_{2j} - 1 | \theta_{2j}), and b_{2j} = F(y_{2j} | \theta_{2j}). The key question basically is how can we implement this likelihood in Stan? Note that in binomial models, either a_{2j} = 0 or b_{2j} = 1, so that, either \Phi^{-1}(a_{2j}) = -\infty or \Phi^{-1}(b_{2j}) = \infty.
I wonāt go into the details, but one can show that if you consider the transformation z_{kj} = \Phi^{-1}(u_{kj}) for k = 1, 2, j = 1, \ldots, J_k, we can write
where I used the partition
Thus, the likelihood can be decomposed as a (untruncated) MVN (which is easy), a product of known density functions (again, easy), and a truncated multivariate normal density, where the bounds are changing at every iteration. In Benās code, he mostly does the development when there is either an upper or lower bound (although he does offer a note about when thereās both an upper and lower bound). The general solution for a copula would be to allow for both an upper and lower bound. For example, if we have a Poisson variable and the observed value is nonzero, there would be both an upper and a lower bound on the latent variable.
I think the big challenge is how to handle the varying bounds at each iteration and implement that into the multivariate truncated normal.
Wanted to suggest something like that. Benās code worked pretty well for multivariate ordinal models (where you have both bounds), so it could - at least in principle work for others as well.
Thank you for this clear exposition. My background on copulas is recent, just learned about them a few months ago. Iām not well versed on the literature of mixing discrete with continuous outcomes so not sure if my approach above is even valid.
Iām interested if you can get the tMVN from Ben to work, as I think lots of other people are too.
I didnāt want you to go empty handed so hereās the tMVN that does both bounds. I modified the signature to take in a vector of lower and upper bounds. The input s
is now just 1 if there are bounds and 0 if not. You put in -Inf and +Inf if there are no bounds.
functions {
vector[] make_stuff(vector mu, matrix L, vector lb, vector ub, vector s, vector u) {
int K = rows(mu);
vector[K] d;
vector[K] z;
vector[K] out[2];
for (k in 1:K) {
int km1 = k - 1;
if (s[k] != 0) {
vector[2] z_star = ([lb[k], ub[k]]' - (mu[k] + ((k > 1) ? L[k, 1:km1] * head(z, km1) : 0))) / L[k, k];
vector[2] u_star = Phi(z_star);
real v = u_star[1] + (u_star[2] - u_star[1]) * u[k];
d[k] = u_star[2] - u_star[1];
z[k] = inv_Phi(v);
}
else {
z[k] = inv_Phi(u[k]);
d[k] = 1;
}
}
out[1] = z;
out[2] = d;
return out;
}
}
data {
int<lower=2> K; // number of dimensions
vector[K] lb; // lower bound
vector[K] ub;
// s[k] == 0 implies no constraint; otherwise
// s[k] == -1 -> b[k] is an upper bound
// s[k] == +1 -> b[k] is a lower bound
vector<lower=0, upper=1>[K] s;
vector[K] mu;
cholesky_factor_cov[K,K] L;
}
parameters {
vector<lower=0,upper=1>[K] u;
}
model {
target += log(make_stuff(mu, L, lb, ub, s, u)[2]); // Jacobian adjustments
// implicit: u ~ uniform(0,1)
}
generated quantities {
vector[K] y = mu + L * make_stuff(mu, L, lb, ub, s, u)[1];
}
library(rstan)
expose_stan_functions("tMVN_both.stan")
K <- 2
rho <- 0.5
Sigma <- matrix(c(1,rho,rho,1), K, K)
lb <- c(1, 1.5)
ub <- c(1.3, 1.9)
mu <- c(0, 0)
L <- t(chol(Sigma))
s <- c(1, 1)
u <- runif(2)
make_stuff(mu, L, lb, ub, s, u)
### returns
> make_stuff(mu, L, lb, ub, s, u)
[[1]]
[1] 1.012789 1.395718
[[2]]
[1] 0.06185477 0.07183871
### if you have no bounds put in Inf like in
lb <- c(-Inf, 1.5)
ub <- c(1.3, 1.9)
> make_stuff(mu, L, lb, ub, s, u)
[[1]]
[1] -0.6108438 2.2556763
[[2]]
[1] 0.90319952 0.01310842
Note
With certain correlations, certain āunlikelyā bounds are not respected. Not sure if thatās expected or not. For eg, the above with -Inf will sometimes give answers that are > 1.9 for the second variate (as in the 2.25 you see). I thought this was an error but Iām not sure. It may be that the correlation and bounds are invalid together. Something to potentially investigate more.
When there is no correlation this pathology doesnāt appear to exist
> out <- list()
> for (i in 1:10) {
+ u <- runif(2)
+ out[[i]] <- make_stuff(mu, L, lb, ub, s, u)[[1]]
+ }
> do.call(rbind, out)
[,1] [,2]
[1,] -1.35995114 1.778739
[2,] -2.04867457 1.618372
[3,] -0.25643875 1.516963
[4,] -0.09994477 1.584373
[5,] -1.54324832 1.731444
[6,] 1.07798076 1.726483
[7,] 0.57358502 1.567540
[8,] -1.10508155 1.781897
[9,] 0.01867937 1.707072
[10,] -0.29349804 1.868850
If the correlation is set to 0.05, I get 52/1000 that are over 1.9. If \rho = 0.5 then 572/1000 are over 1.9.
Thanks! In the interim, I did something similar to allow both bounds. In this code, I was able to extract a reasonably close value for the posterior mean of the correlation, and the posterior mean / variance of the regression coefficients matched up exactly with the MLEs
Itās probably not the most optimal way to do it (Iām decent at programming, but definitely more on the statistics side of things), but it appears to work. If thereās a better way to optimize it, please let me know.
functions{
//-----------------------------------------------------------------
// function to increment log target probability for truncated MVN
// * @param u K-dimensional vector in (0,1)^K
// * @param mu marginal mean of truncated MVN
// * @param L lower cholesky factor of covariance matrix
// * @param lb lower bounds of u's (ignored if lb_ind == 0)
// * @param ub upper bounds of u's (ignored if ub_ind == 0)
//
// * @return K-dim vector giving log target increment
//-----------------------------------------------------------------
vector multi_normal_truncated(vector u, vector mu, matrix L, vector lb, vector ub, vector lb_ind, vector ub_ind) {
int K = rows(u);
vector[K] d;
vector[K] z;
// vector[K] out[2];
for ( k in 1:K ) {
// if unbounded
if ( lb_ind[k] == 0 && ub_ind[k] == 0 ) {
z[k] = inv_Phi(u[k]);
d[k] = 1;
}
else {
int km1 = k-1;
real v;
real z_star;
// lower bound, no upper bound
if ( lb_ind[k] == 1 && ub_ind[k] == 0 ) {
real u_star;
z_star = (lb[k] - (mu[k] + ((k > 1) ? L[k,1:km1] * head(z, km1) : 0) ) ) / L[k,k];
u_star = Phi(z_star);
d[k] = 1 - u_star;
v = u_star + d[k] * u[k]; // v ~ U(u_star, 1)
}
// upper bound, no lower bound
else if ( lb_ind[k] == 0 && ub_ind[k] == 1 ) { // upper bound, no lower bound
real u_star;
z_star = (ub[k] - (mu[k] + ((k > 1) ? L[k,1:km1] * head(z, km1) : 0) ) ) / L[k,k];
u_star = Phi(z_star);
v = u_star * u[k]; // v ~ U(0, u_star)
d[k] = u_star;
}
// both lower and upper bounds
else {
// compute u_star for upper and lower bounds
real u_star_lb;
real u_star_ub;
u_star_lb = Phi( (lb[k] - (mu[k] + ((k > 1) ? L[k,1:km1] * head(z, km1) : 0) ) ) / L[k,k] );
u_star_ub = Phi( (ub[k] - (mu[k] + ((k > 1) ? L[k,1:km1] * head(z, km1) : 0) ) ) / L[k,k] );
d[k] = (u_star_ub - u_star_lb);
v = u_star_lb + (u_star_ub - u_star_lb) * u[k]; // v ~ U(u_star_lb, u_star_ub)
}
z[k] = inv_Phi(v);
}
}
return log(d);
}
//-----------------------------------------------------------------
// function to CHECK whether bernoulli var has upper/lower bound
// * @param y N-dim integer array of 0's and 1's
//
// * @return Nx2 matrix giving indicator of bounds:
// * 1st column is indicator of lb, 2nd is indicator of ub
//-----------------------------------------------------------------
matrix bounds_indicator_bernoulli(int[] y) {
int n = size(y);
matrix[n,2] bound_ind;
for ( i in 1:n ) {
// if success, there is a lower bound but no upper bound
if (y[i] == 1) {
bound_ind[i,1] = 1;
bound_ind[i,2] = 0;
}
// if failure, there is upper bound but no lower bound
else {
bound_ind[i,1] = 0;
bound_ind[i,2] = 1;
}
}
return bound_ind;
}
//------------------------------------------------------------------------
// function to OBTAIN bounds of bernoulli variable given linear predictor
// * @param y N-dim integer array of 0's and 1's
// * @param eta N-dim vector giving linear predictor
//
// * @return Nx2 matrix giving bounds:
// * 1st column is lower bound, 2nd is upper bound
//------------------------------------------------------------------------
matrix bounds_bernoulli(int[] y, vector eta) {
int n = size(y);
matrix[n,2] bounds;
vector[n] mu;
bounds = rep_matrix(0, n, 2);
// get mean
mu = inv_logit(eta); // assume logistic link for time being
// fill in bounds
for ( i in 1:n ) {
// if y = 0, there is an upper bound and no lower bound
if ( y[i] == 0 ) {
bounds[i,2] = inv_Phi( bernoulli_cdf(0, mu[i]) );
}
// if y = 1, there is a lower bound and no upper bound
else {
bounds[i,1] = inv_Phi( bernoulli_cdf(0, mu[i]) );
}
}
return bounds;
}
}
data {
int<lower=0> N;
int<lower=0> pnormal;
int<lower=1> pbernoulli;
real ynormal[N];
int<lower=0,upper=1> ybernoulli[N];
matrix[N,pnormal] Xnormal;
matrix[N,pbernoulli] Xbernoulli;
vector[pnormal] beta0normal;
matrix[pnormal, pnormal] Sigma0normal;
real<lower=0> tau_shape;
real<lower=0> tau_rate;
vector[pbernoulli] beta0bernoulli;
matrix[pbernoulli,pbernoulli] Sigma0bernoulli;
}
transformed data {
// obtain matrix giving whether ybernoulli has upper/lower bound
matrix[N,2] u2_bounds_ind;
vector[2] zeros = rep_vector(0, 2);
vector[2] mu = zeros;
u2_bounds_ind = bounds_indicator_bernoulli(ybernoulli);
}
parameters {
vector[pnormal] beta1; // reg coefs for gaussian variable
real<lower=0> tau; // inverse variance for gaussian var
vector[pbernoulli] beta2; // reg coefs for bernoulli variable
real<lower=0,upper=1> u2[N]; // latent variable for bernoulli
cholesky_factor_corr[2] L; // Cholesky decomposition of 2x2 correlation matrix
}
model {
real sigma = sqrt(inv(tau)); // standard deviation for normal variable
vector[N] eta1 = Xnormal * beta1; // linear predictor for normal variable
vector[N] eta2 = Xbernoulli * beta2; // linear predictor for bernoulli variable
matrix[N, 2] u2bounds; // matrix storing bounds for uniform generated variable for bernoulli
vector[2] lb; // lower bound for u's
vector[2] ub; // upper bound for u's
vector[2] lb_ind; // indicates if there is a lower bound (first element 0 since continuous)
vector[2] ub_ind; // indicates if there is an upper bound (first element 0 since continuous)
// initialize vectors with 0s
lb = zeros; // lower bound of u's (ignored if lb_ind == 0)
ub = zeros; // upper bound of u's (ignored if ub_ind == 0)
lb_ind = zeros; // indicates if there is a lower bound for u's
ub_ind = zeros; // indicates if there is an upper bound for u's
// get bounds for u2
u2bounds = bounds_bernoulli(ybernoulli, eta2);
// priors
beta1 ~ multi_normal(beta0normal, Sigma0normal);
beta2 ~ multi_normal(beta0bernoulli, Sigma0bernoulli);
tau ~ gamma(tau_shape, tau_rate);
L ~ lkj_corr_cholesky(1.0); // uniform prior
ynormal ~ normal(Xnormal * beta1, sigma);
for ( i in 1:N ) {
vector[2] u;
u[1] = Phi_approx((ynormal[i] - eta1[i]) / sigma);
u[2] = u2[i];
lb[2] = u2bounds[i,1];
ub[2] = u2bounds[i,2];
lb_ind[2] = u2_bounds_ind[i,1];
ub_ind[2] = u2_bounds_ind[i,2];
target += multi_normal_truncated(
u, mu, L, lb, ub, lb_ind, ub_ind
);
}
}
generated quantities {
corr_matrix[2] Gamma;
Gamma = tcrossprod(L);
}
And your code has the same problem as mine
K <- 2
rho <- -.5
Sigma <- matrix(c(1,rho,rho,1), K, K)
lb <- c(1.5, 1.5)
ub <- c(1.3, 1.9)
mu <- c(0, 0)
L <- t(chol(Sigma))
s <- c(1, 1)
out <- list()
lb_ind <- c(1, 1)
ub_ind <- c(1, 1)
multi_normal_truncated(mu, L, lb, ub, lb_ind, ub_ind, u)
[1] 1.429983 2.750805
Iām AFK at the moment but it looks like your lower bound for the first element is larger than your upper bound. Does that change anything?
No, it does not but youād think it would! Anyway, tested some more and our code is equivalent.
Edit: See the following post that you cannot just use expose_stan_functions to generate random variates as this does not take the jacobian into account. These plots just show what happens when there is no jacobian!
K <- 2
rho <- -.5
Sigma <- matrix(c(1,rho,rho,1), K, K)
lb <- c(1, 1.5)
ub <- c(1.3, 1.9)
mu <- c(0, 0)
L <- t(chol(Sigma))
s <- c(1, 1)
out <- list()
lb_ind <- c(1, 1)
ub_ind <- c(1, 1)
out_sp <- list()
out_tar <- list()
for (i in 1:10000) {
u <- runif(2)
out_sp[[i]] <- make_stuff(mu, L, lb, ub, s, u)[[1]]
out_tar[[i]] <- multi_normal_truncated(mu, L, lb, ub, lb_ind, ub_ind, u)
}
out_sp <- do.call(rbind, out_sp)
out_tar <- do.call(rbind, out_tar)
which(out_sp != out_tar)
> integer(0)
The interesting part is when I plot the output. It should be in a box bounded by those lower and upper bounds, howeverā¦
If instead I set the correlation to 0 it satisfies those bounds
Plotting the full MVN with rho = -0.5 and the box of bounds does show density there
Now, if I remove the lower bound (which indeed your code and mine are equivalent which I checked again - though with mine you can remove the extra *_ind things and shorten the code, but I digress). Hereās the plot of the output that shows the grayed out portion which violates the bounds:
I guess āthereās something strange in the neighborhoodā¦ā
Once again the 0 corr plot satisfies those bounds so itās an issue with the correlation adjustment
To add insult to injury for that first plot, the correlation appears to be positive and not negative.
Not only does it not obey the bounds, it doesnāt obey the correlation.
Yea, but the key missing component is that the jacobian updates the space. So what I showed above is without the jacobian adjustment. Indeed when I sample with the adjustment things turn out nicely.
Ooh, multivariate stats is beautiful (rho = -0.5)
And rho = -0.8
@tarheel do you have the derivation of your formula above or a paper to point to? That may also have the general case when you have J discrete and K continuous margins?
The derivation is from that Smith and Khaled (2012) paper I pointed to earlier. Check out Section 6.1 and proposition 3, in particular you can combine equations (6.1) and the representation of f(u_D | x) in bullet point (i) in that proposition to get the augmented likelihood.
Also be sure to check out Equation (2.3) (for the fully discrete setting) as that sets the basis for the mixed setting.
The difference between this and their paper is theyāre focused on Gibbs-like sampling, so they work out the full conditionals and stuff. Luckily, that doesnāt need to be done (a distinct advantage of Stan for copulasāif the likelihood can be implemented).
The copula is decomposed there as
c(u_C, u_D) = c(u_C) c(u_D | u_C)
but thereās no reason to decompose in general. We just need the joint likelihood!
@tarheel did most of the heavy lifting. I just packaged it up with a bow. Hereās the discrete-continuous multi-variate gaussian case. It uses the truncated MVN to derive the bounds from the latent continuous margins needed for the discrete margins.
Itās in PR status but you can check it out at
added mixed_copula by spinkney Ā· Pull Request #13 Ā· spinkney/helpful_stan_functions (github.com)
You should be able to add gaussian, bernoulli, and poisson or leave any one of those out and the code should run. If you want to add different margins itās a bit of work but do-able. Ideally thereād be a functor that could take in the margin distribution and derive outcome. Until then youāll have to add it - prās are welcome in the helpful functions repo :)
Where can I find Benās codeļ¼ I searched on google but found nothing.