My first idea would be to use J observation-level latent variables, model those as multivariate normal, and plug them into the formulae that generate the parameters for your J responses. So:
L \sim N_j(0,\Sigma)
Where L has i rows and j columns, and \Sigma is a J \times J correlation matrix. Additionally, you’d want to fit J scale parameters \sigma_j as well.
The for each response for some location parameter µ: µ_{ij} = f(L_{ij}\sigma_j).
This would cause non-normal covariance at the scale of observations.
A toy example in cmdstanr. I had to put a pretty strong prior on sigma1 to keep it from getting stuck at zero, since in this toy example the residuals and random effects aren’t well identified otherwise.
data {
int<lower=1> N;
int<lower=1> J;
real y1[N];
int y2[N];
}
parameters {
// variance parameters
real<lower=0> sigma1;
real<lower=0> phi2;
// latent variables
vector[J] L[N];
cholesky_factor_corr[J] cholcor;
// scale of latent variables
vector<lower=0>[J] sigmaL;
}
model {
phi2 ~ exponential(2); // negative binomial scale
sigma1 ~ gamma(6,10); // lognormal scale
sigmaL ~ exponential(1); // scales of latent variables
cholcor ~ lkj_corr_cholesky(2); // correlations of latent variables
L ~ multi_normal_cholesky(rep_vector(0,J),cholcor); // latent variables
y1 ~ lognormal(to_vector(L[1:N,1])*sigmaL[1],sigma1); // y1 is lognormal
y2 ~ neg_binomial_2(exp(to_vector(L[1:N,2])*sigmaL[2]),phi2); // y2 is negative binomial
}
generated quantities {
real y1out[N] = lognormal_rng(to_vector(L[1:N,1])*sigmaL[1],sigma1);
int y2out[N] = neg_binomial_2_rng(exp(to_vector(L[1:N,2])*sigmaL[2]),phi2);
}
library(cmdstanr)
library(bayesplot)
library(posterior)
library(tidyverse)
# Generate some continuous and integer-valued correlated data
cormat <- matrix(c(1,0.8,0.8,1),ncol=2)
y <- MASS:: mvrnorm(50, c(0,0), cormat)
y <- exp(y)
y[,2] <- round(y[,2])
datlist <- list(
y1 = y[,1],
y2 = as.integer(y[,2]),
N = NROW(y),
J = NCOL(y)
)
model <- cmdstan_model("scripts/latent_re.stan")
fit <- model$sample(data = datlist, parallel_chains = 4, adapt_delta = 0.99, max_treedepth = 15)
I show an example where I have 4 RV, normal, gumbel, weibull, and lognormal all correlated with each other.
The R code to simulate and run everything is also in that repo.
This works for margins that are continuous. The negative binomial involves more work. But @tarheel has done the hard work of writing code for mixed continuous and discrete margins that use a gaussian copula.
The idea for Gaussian copulas is basically there’s a latent variables Z that is multivariate normal with mean 0 and and a correlation matrix. Each margin in Z is then transformed to (0,1) by applying the standard normal CDF. We then apply the inverse CDF based on the distribution your data are (e.g. log-normal, negative binomial).
For continuous marginals, the inverse CDF is one-to-one, so given Y, the latent variable pertaining to that margin isn’t really latent.
For discrete marginals, the inverse CDF is a one-to-many function, so they really are latent (but we know the bounds of the latent variable) and must be sampled in the sampling scheme.
The discrete / mixed case is significantly more difficult to work with than the continuous case, but hopefully you can figure out what’s going on in the mixed code provided by @spinkney
Something that would be really useful is to add functors that take in distributions. Right now the mixed case is written for poisson and bernoulli discrete marginals. However, it would be cool to have a kernel like functor that takes in the distribution and handles everything knowing that the marginal is distributed as x.
I looked at copulas yesterday but wasn’t understanding how they worked. That example model makes it a lot clearer how they’re used so maybe I can dig into the Math papers now.
@spinkney what’s the link to the mixed continuous/discrete code?