Would you be able to write up a Cvine example with 3 marginals in Stan?
Yes, I think I will.
Iāve attached some files (you can save to a folder and just adjust the working directory in the file that starts with āS_ā) to compare a few different ways of fitting multivariate normals in Stan: 1) Stanās multi_normal_cholesky, 2) an individual regression approach Iāve discussed on the forums previously, 3) a partial correlation approach that uses multi_normal, and 4) the Cvine copula approach (which is also copied below and borrows from your copula code).
The true correlation matrix is set up so that the partial correlation of variables 2 and 3 given variable 1 is zero.
Iām using slightly different priors, so they arenāt exactly comparable, but they all give moreorless the same results. The individual regressions approach is the fastest (chains run around 0.5s or so on my machine with my settings) and results in the highest effective samples. I directly calculated the covariance matrix, instead of inverting a matrix which probably works better when you get more complicated. The vine approach is the second fastest (chains running around 1.52s). The multinormal cholesky chains take around 510s and the partial correlation version around 20s (I did this one since the output is comparable to the Cvine and the difference in the runtime reflects the slowdown as a result of using the multinormal instead of the bivariate).
Anyway, the Cvine example below is specifically set up for normal marginals and a normal copula. Things get a little more complicated when you move away from this (I start getting a bit beyond my skis, but I found this explanation pretty good: https://tnagler.github.io/vinearisa.pdf).
In the more general case, the copula here would take the conditional CDFs of the variables, so it looks something like c_231(F_21(x2x1),F_31(x3x1)). F_21(x2x1) is the conditional CDF and is calculated by taking the derivative with respect to x1 of the c12 copula while fixing F_1(x1) values. It simplifies nicely for Gaussians, but not more generally as you could imagine.
There is an R package VineCopula that looks good, though it looks the bleeding edge is a C++ implementation called vinecopulib (with R/Python interfaces).
functions {
real normal_copula_vector_lpdf(vector u, vector v, real rho) {
int N = num_elements(u);
real rho_sq = square(rho);
real a1 = 0.5 * rho;
real a2 = rho_sq  1;
real a3 = 0.5 * log1m(rho_sq);
real x = 2 * u' * v + rho * (dot_self(u) + dot_self(v));
return a1 * x / a2  N * a3;
}
}
data {
int<lower=0> T;
matrix[T, 3] X;
}
transformed data {
int N = 3;
}
parameters {
vector[N] mu;
vector<lower=0>[N] sigma;
real<lower=0,upper=1> p1_raw;
real<lower=0,upper=1> p12_raw;
real<lower=0,upper=1> p13_raw;
real<lower=0,upper=1> p23_given_1_raw;
}
transformed parameters {
real<lower=1,upper=1> p12;
real<lower=1,upper=1> p13;
real<lower=1,upper=1> p23_given_1;
p12 = (p12_raw  0.5) * 2;
p13 = (p13_raw  0.5) * 2;
p23_given_1 = (p23_given_1_raw  0.5) * 2;
}
model {
// priors
mu ~ normal(0.15, 0.15);
sigma ~ cauchy(0, 0.3);
p1_raw ~ beta(19, 1);
p12_raw ~ normal(p1_raw, 0.1);
p13_raw ~ normal(p1_raw, 0.1);
p23_given_1_raw ~ beta(1, 1);
// univariate
for (i in 1:3) {
X[, i] ~ normal(mu[i], sigma[i]);
}
{
matrix[T, N] Uinv;
matrix[T, N  1] Uinv_2nd_level;
for (i in 1:N) {
// Since assuming normal marginals:
// U = Phi((x  mu) / sigma), and
// U_inv = inv_Phi(U) = (x  mu) / sigma
Uinv[, i] = (X[, i]  mu[i]) / sigma[i];
}
// fit first level
Uinv[, 2] ~ normal_copula_vector(Uinv[, 1], p12);
Uinv[, 3] ~ normal_copula_vector(Uinv[, 1], p13);
// fit second level
// The general appraoch is to get each conditional CDF. This requires
// taking the derivative of the copula CDF when holding one variable
// constant. Things are simpler here because I'm using normal marginals
// and copulae, but this doesn't hold generally. What I'm doing here
// is essentially regressing variables 2 and 3 on variable 1 and
// and dividing by the residual standard deviation. Since we have
// standardized the data above, it simplifies the calculations here
// and I don't need to do the regression because the correlations are
// reused from above. Works for normals, but not more generally.
Uinv_2nd_level[, 1] = (Uinv[, 2]  p12 * Uinv[, 1]) / sqrt(1  square(p12));
Uinv_2nd_level[, 2] = (Uinv[, 3]  p13 * Uinv[, 1]) / sqrt(1  square(p13));
Uinv_2nd_level[, 2] ~ normal_copula_vector(Uinv_2nd_level[, 1], p23_given_1);
}
}
generated quantities {
real p23;
p23 = p12 * p13 + p23_given_1 * sqrt(1  square(p12)) * sqrt(1  square(p13));
}
S_Simple_Cvine_Copula.r (4.6 KB)
simple_cvine_copula.stan (2.8 KB)
simple_multinormal.stan (675 Bytes)
simple_multinormal_indv_regressions.stan (1.5 KB)
simple_multinormal_partials.stan (1.3 KB)
Thanks for putting this together! Itās nice to see these compared, probably a really nice overview paper or case study. Do you have any literature for the individual regressions approach? Seeing the math would be helpful for generalization, though it must just follow from the conditional normal.
Iām interested in nonnormal marginals for the vine, do you have any examples using parametric bivariate constructors with nonnormal marginals? The reference you provided is good. If I wasnāt so busy (or, maybe, lazy :)) Iād work out a few conditional copulaes from it. Iād appreciate anything you may have to help get started here.
As for the speed of the partials and multinormal, I believe youāll get a more fair comparison if you vectorize over T
. The derivatives are really slowing down because theyāre being calculated on each row of data instead of 1 time for the entire vector. I didnāt test the code but writing the simple_multinormal.stan
with an array of vectors for the data (I also assigned directly to objects which is ever so slightly faster) allows the removal of the T loop. Curious how this changes the timing on your machine. The same update may be applied to the simple_multinormal_partials.stan
code.
data {
int<lower=0> T;
array[T] vector[3] X;
}
transformed data {
int N = 3;
}
parameters {
vector[N] mu;
vector<lower=0>[N] L_sigma;
cholesky_factor_corr[N] L_Omega;
}
model {
// priors
mu ~ normal(0.15, 0.15);
L_sigma ~ cauchy(0, 0.3);
L_Omega ~ lkj_corr_cholesky(1);
// model
{
matrix[N, N] L_Sigma = diag_pre_multiply(L_sigma, L_Omega);
X ~ multi_normal_cholesky(mu, L_Sigma);
}
}
generated quantities {
matrix[N, N] Sigma = multiply_lower_tri_self_transpose(diag_pre_multiply(L_sigma, L_Omega));
}
I could write something up, though I wrote up a case study last year and after writing it I havenāt heard much since then.
The individual regressions approach is really straightforward when you think about it in state space terms. In Econometrics, they have this concept of a structural VAR. If you remove the VAR aspect, then it kind of flows from if you assume a specific decomposition of the covariance matrix. Here (161.5 KB) is an incomplete draft of some notes I wrote up on it last year (before getting distracted with other things).
I think vines with nonnormal marginals and normal copulas would work fine since the tricky part is the conditional CDF and the marginals are all converted to cumulative CDFs. I make no promises about nonnormal copulas. That being said, I would need to do some tests to make sure I am right about this. A writeup could incorporate this.
In terms of other materials to understand them. Vine copulas certainly werenāt an easy subject for me to grok. I came back to them on and off a few times before I really understood what was going on (and Iām still no expert). This presentation (http://www.columbia.edu/~rf2283/Conference/2Models%20(1)%20Seagers.pdf) covers vines at the end in a pretty clear manner that helped me understand.
With respect to your example, I was going to try that, but I got lazy. It certainly would be what I would want to do in any writeup. The problem ultimately is that multi_normal
and multi_normal_cholesky
donāt scale well, even if you vectorize. Being able to show that in a writeup would be interesting.
The necessary matrix inversion in the multinormal code really hurts scalability. In addition, Iāve seen issues with high dimensional construction of the Cholesky factor which is fixed by using the onion method.
I think it would be cool to generalize the regression approach for any dimension with a function. It would be really useful!
I found a masterās thesis (https://mediatum.ub.tum.de/doc/1694263/document.pdf) which did vine copulas in Stan but the relevant Stan functions are missing from the doc. Gives a good overview of vines and how to use them.
In the rvinecopulib
they reference Aas, K., C. Czado, A. Frigessi, and H. Bakken (2009). Paircopula constructions of multiple dependence. Insurance: Mathematics and Economics 44 (2), 182198. Thereās a bunch of conditional functions, they call h
functions in the appendix for bivariate copulas.
I think it is similar to the onion method. I might have a generalized version of the regression approach somewhere. Iāll need to look for it.
The papers look to be good references. Glad to see I was able to derive the Gaussian hfunction on my own. This makes it a bit clearer that the only difference for the joint distribution if you have different marginal distributions would be that you need a step to calculate U for each and then get Uinv=inv_Phi(U)
. Everything else should be the same as what I have if youāre using bivariate Gaussian copulae.
When I vectorize everything, the elapsed times look like below. fit_1
is the multi_normal, fit_2
is individual regressions, fit_3
uses partial correlations, fit_4
uses copulas. It doesnāt really change the conclusions, just the magnitudes of everything
> mean(get_elapsed_time(fit_1)[, 1] + get_elapsed_time(fit_1)[, 2])
[1] 4.48875
> mean(get_elapsed_time(fit_2)[, 1] + get_elapsed_time(fit_2)[, 2])
[1] 0.4755
> mean(get_elapsed_time(fit_3)[, 1] + get_elapsed_time(fit_3)[, 2])
[1] 8.46725
> mean(get_elapsed_time(fit_4)[, 1] + get_elapsed_time(fit_4)[, 2])
[1] 2.272
This is a version of the multivariate normal with individual regressions above that works for any size input. Itās a little bit faster than the one above too, for some reason. I think I had an earlier forum post where I had a version of this that was fancier (included QR decompositions and normalized all the variables). This one is less complicated so itās a bit easier to understand (the other one might have a typo in it too, but this produces pretty much the exact same mean/covariance as the other ones).
data {
int<lower=0> T;
int<lower=0> N;
vector[T] X[N];
}
transformed data {
matrix[T, N] X_mat;
int k[N];
X_mat[, 1] = X[1];
k[1] = 0;
for (i in 2:N) {
k[i] = k[i  1] + (i  1);
X_mat[, i] = X[i];
}
}
parameters {
vector[N] alphas;
vector[k[N]] betas;
vector<lower=0>[N] sigmas;
}
model {
// priors
alphas[1] ~ normal(0.15, 0.15);
alphas[2:N] ~ normal(0, 0.15);
betas ~ normal(1, 1);
sigmas ~ cauchy(0, 0.3);
X[1] ~ normal(alphas[1], sigmas[1]);
{
for (i in 2:N) {
X[i] ~ normal(alphas[i] + X_mat[, 1:(i  1)] * betas[(k[i  1] + 1):k[i]], sigmas[i]);
}
}
}
generated quantities {
vector[N] mu;
matrix[N, N] Sigma;
{
matrix[N, N] B;
matrix[N, N] L;
for (i in 1:N) {
for (j in 1:N) {
if (j >= i) {
B[i, j] = 0;
} else {
B[i, j] = betas[k[i  1] + j];
}
}
}
L = B;
for (i in 1:N) {
L[i, i] = L[i, i] + 1;
}
L = inverse(L);
mu = L * alphas;
Sigma = multiply_lower_tri_self_transpose(diag_post_multiply(L, sigmas));
}
}
Thanks John and very cool stuff! Iām going to test this out in my programs. Can you wrap it in a function for easier reuse?
Iām curious how this may help with multivariate priors and other things that rely upon the multivariate normal.
I wonder if the multivariate student t may be able to be expressed similarly but would need to go through the math to verify.
See below. Iām still using rstan 2.21, so I donāt have access to the to_int function, which makes some things a little more awkward (I vaguely recall being able to cast before, but I wasnāt able to get it working this time around).
I donāt precisely know for sure on Student t either.
functions {
real multi_normal_indv_regressions_lpdf(vector x, vector alphas, vector betas, vector sigmas) {
int N = num_elements(x);
real output = 0;
int k_prior = 0;
int k_current = 0;
output = output + normal_lpdf(x[1]  alphas[1], sigmas[1]);
for (i in 2:N) {
k_current = k_prior + i  1;
output = output + normal_lpdf(x[i]  alphas[i] + dot_product(x[1:(i  1)], betas[(k_prior + 1):k_current]), sigmas[i]);
k_prior = k_current;
}
return output;
}
real multi_normal_indv_regressions_matrix_lpdf(matrix X, vector alphas, vector betas, vector sigmas) {
int N = num_elements(alphas);
real output = 0;
int k_prior = 0;
int k_current = 0;
output = output + normal_lpdf(X[, 1]  alphas[1], sigmas[1]);
for (i in 2:N) {
k_current = k_prior + i  1;
output = output + normal_lpdf(X[, i]  alphas[i] + X[, 1:(i  1)] * betas[(k_prior + 1):k_current], sigmas[i]);
k_prior = k_current;
}
return output;
}
matrix convertCoefficientVectorToCholesky(vector betas, int N) {
//int N = to_int(0.5 * (sqrt(1 + 8 * num_elements(betas))  1)); // using quadratic formula to solve 0.5*x*(x+1)=num_elements(x)
matrix[N, N] B;
matrix[N, N] L;
int k_prior = 0;
int k_current = 0;
for (i in 1:N) {
k_current = k_prior + i  1;
for (j in 1:N) {
if (j >= i) {
B[i, j] = 0;
} else {
B[i, j] = betas[k_prior + j];
}
}
k_prior = k_current;
}
L = B;
for (i in 1:N) {
L[i, i] = L[i, i] + 1;
}
return inverse(L);
}
}
data {
int<lower=0> T;
int<lower=0> N;
matrix[T, N] X;
}
transformed data {
int k[N];
k[1] = 0;
for (i in 2:N) {
k[i] = k[i  1] + (i  1);
}
}
parameters {
vector[N] alphas;
vector[k[N]] betas;
vector<lower=0>[N] sigmas;
}
model {
// priors
alphas[1] ~ normal(0.15, 0.15);
alphas[2:N] ~ normal(0, 0.15);
betas ~ normal(1, 1);
sigmas ~ cauchy(0, 0.3);
X ~ multi_normal_indv_regressions_matrix(alphas, betas, sigmas);
}
generated quantities {
vector[N] mu;
matrix[N, N] Sigma;
{
matrix[N, N] L;
L = convertCoefficientVectorToCholesky(betas, N);
mu = L * alphas;
Sigma = multiply_lower_tri_self_transpose(diag_post_multiply(L, sigmas));
}
}
@jmh530 this is awesome!
I have two seemingly āsimpleā questions in the grand scheme of things:

Is there any reason to use the regression approach over the C Vine or vice versa? I particular ask in reference to the ability to learn the dependence structure underlying 54 variables. I have fit a Gaussian copula, but you mentioned the vine is more feasible in higher dimensionsā¦

Given that, do you know of a way to generalize your c vine code to 54 traitsā¦ i suppose im asking in terms of the most efficient way to do so in Stanā¦
What the C vine gets you is flexibility. You can choose each pair copula and all the marginals to be whatever distribution you want. The regression approach you could probably make alternate assumptions about the residuals (using a tdistribution for instance), but then you would no longer be able to do an analytic covariance matrix. You also are enforcing a linear relationship between the variables without any transformations that the C vine approach would allow. So if your data is approximately normal and linear, the regression approach could be a good starting point, but that might not be a good fit for everyone.
In terms of generalizing it, it shouldnāt be hard to generalize, I just havenāt gotten around to it. The problem is that it is much easier to express models in Stan when you know in advance exactly what the distributions you want to fit are. Some of the frequentist implementations of vines will have an option to fit multiple different dependence structures and then go with the best ones. To do something like that with Stan would require fitting lots of different models and if your data is large then it might take some time to do that. Being able to simplify the problem somehow will usually help in terms of running time when doing stuff with Stan. Itās usually better to start with a simple model and then make more complicated.