Vine copula examples in Stan

Would you be able to write up a C-vine example with 3 marginals in Stan?

2 Likes

Yes, I think I will.

1 Like

I’ve attached some files (you can save to a folder and just adjust the working directory in the file that starts with “S_”) to compare a few different ways of fitting multivariate normals in Stan: 1) Stan’s multi_normal_cholesky, 2) an individual regression approach I’ve discussed on the forums previously, 3) a partial correlation approach that uses multi_normal, and 4) the C-vine copula approach (which is also copied below and borrows from your copula code).

The true correlation matrix is set up so that the partial correlation of variables 2 and 3 given variable 1 is zero.

I’m using slightly different priors, so they aren’t exactly comparable, but they all give more-or-less the same results. The individual regressions approach is the fastest (chains run around 0.5s or so on my machine with my settings) and results in the highest effective samples. I directly calculated the covariance matrix, instead of inverting a matrix which probably works better when you get more complicated. The vine approach is the second fastest (chains running around 1.5-2s). The multinormal cholesky chains take around 5-10s and the partial correlation version around 20s (I did this one since the output is comparable to the Cvine and the difference in the run-time reflects the slowdown as a result of using the multi-normal instead of the bivariate).

Anyway, the C-vine example below is specifically set up for normal marginals and a normal copula. Things get a little more complicated when you move away from this (I start getting a bit beyond my skis, but I found this explanation pretty good: https://tnagler.github.io/vine-arisa.pdf).

In the more general case, the copula here would take the conditional CDFs of the variables, so it looks something like c_23|1(F_2|1(x2|x1),F_3|1(x3|x1)). F_2|1(x2|x1) is the conditional CDF and is calculated by taking the derivative with respect to x1 of the c12 copula while fixing F_1(x1) values. It simplifies nicely for Gaussians, but not more generally as you could imagine.

There is an R package VineCopula that looks good, though it looks the bleeding edge is a C++ implementation called vinecopulib (with R/Python interfaces).

functions {
    real normal_copula_vector_lpdf(vector u, vector v, real rho) {
      int N = num_elements(u);
      real rho_sq = square(rho);
      
      real a1 = 0.5 * rho;
      real a2 = rho_sq - 1;
      real a3 = 0.5 * log1m(rho_sq);
      real x = -2 * u' * v + rho * (dot_self(u) + dot_self(v));
      
      return a1 * x / a2 - N * a3;
    }
}
data {
    int<lower=0> T;
    matrix[T, 3] X;
}
transformed data {
    int N = 3;
}
parameters {
    vector[N] mu;
    vector<lower=0>[N] sigma;
    real<lower=0,upper=1> p1_raw;
    real<lower=0,upper=1> p12_raw;
    real<lower=0,upper=1> p13_raw;
    real<lower=0,upper=1> p23_given_1_raw;
}
transformed parameters {
    real<lower=-1,upper=1> p12;
    real<lower=-1,upper=1> p13;
    real<lower=-1,upper=1> p23_given_1;
    p12 = (p12_raw - 0.5) * 2;
    p13 = (p13_raw - 0.5) * 2;
    p23_given_1 = (p23_given_1_raw - 0.5) * 2;
}
model {
    // priors
    mu ~ normal(0.15, 0.15);
    sigma ~ cauchy(0, 0.3);
    p1_raw ~ beta(19, 1);
    p12_raw ~ normal(p1_raw, 0.1);
    p13_raw ~ normal(p1_raw, 0.1);
    p23_given_1_raw ~ beta(1, 1);

    // univariate
    for (i in 1:3) {
        X[, i] ~ normal(mu[i], sigma[i]);
    }
    {
        matrix[T, N] Uinv;
        matrix[T, N - 1] Uinv_2nd_level;
        for (i in 1:N) {
            // Since assuming normal marginals:
            // U = Phi((x - mu) / sigma), and
            // U_inv = inv_Phi(U) = (x - mu) / sigma
            Uinv[, i] = (X[, i] - mu[i]) / sigma[i];
        }
        // fit first level
        Uinv[, 2] ~ normal_copula_vector(Uinv[, 1], p12);
        Uinv[, 3] ~ normal_copula_vector(Uinv[, 1], p13);
        // fit second level
        // The general appraoch is to get each conditional CDF. This requires
        // taking the derivative of the copula CDF when holding one variable
        // constant. Things are simpler here because I'm using normal marginals
        // and copulae, but this doesn't hold generally. What I'm doing here
        // is essentially regressing variables 2 and 3 on variable 1 and
        // and dividing by the residual standard deviation. Since we have
        // standardized the data above, it simplifies the calculations here
        // and I don't need to do the regression because the correlations are
        // re-used from above. Works for normals, but not more generally.
        Uinv_2nd_level[, 1] = (Uinv[, 2] - p12 * Uinv[, 1]) / sqrt(1 - square(p12));
        Uinv_2nd_level[, 2] = (Uinv[, 3] - p13 * Uinv[, 1]) / sqrt(1 - square(p13));
        Uinv_2nd_level[, 2] ~ normal_copula_vector(Uinv_2nd_level[, 1], p23_given_1);
    }
}
generated quantities {
    real p23;
    p23 = p12 * p13 + p23_given_1 * sqrt(1 - square(p12)) * sqrt(1 - square(p13));
}

S_Simple_Cvine_Copula.r (4.6 KB)
simple_cvine_copula.stan (2.8 KB)
simple_multinormal.stan (675 Bytes)
simple_multinormal_indv_regressions.stan (1.5 KB)
simple_multinormal_partials.stan (1.3 KB)

3 Likes

Thanks for putting this together! It’s nice to see these compared, probably a really nice overview paper or case study. Do you have any literature for the individual regressions approach? Seeing the math would be helpful for generalization, though it must just follow from the conditional normal.

I’m interested in non-normal marginals for the vine, do you have any examples using parametric bivariate constructors with non-normal marginals? The reference you provided is good. If I wasn’t so busy (or, maybe, lazy :)) I’d work out a few conditional copulaes from it. I’d appreciate anything you may have to help get started here.

As for the speed of the partials and multinormal, I believe you’ll get a more fair comparison if you vectorize over T. The derivatives are really slowing down because they’re being calculated on each row of data instead of 1 time for the entire vector. I didn’t test the code but writing the simple_multinormal.stan with an array of vectors for the data (I also assigned directly to objects which is ever so slightly faster) allows the removal of the T loop. Curious how this changes the timing on your machine. The same update may be applied to the simple_multinormal_partials.stan code.

data {
    int<lower=0> T;
    array[T] vector[3] X;
}
transformed data {
    int N = 3;
}
parameters {
    vector[N] mu;
    vector<lower=0>[N] L_sigma;
    cholesky_factor_corr[N] L_Omega;
}
model {
    // priors
    mu ~ normal(0.15, 0.15);
    L_sigma ~ cauchy(0, 0.3);
    L_Omega ~ lkj_corr_cholesky(1);

    // model
    {
        matrix[N, N] L_Sigma = diag_pre_multiply(L_sigma, L_Omega);
        X ~ multi_normal_cholesky(mu, L_Sigma);
    }
}
generated quantities {
    matrix[N, N] Sigma = multiply_lower_tri_self_transpose(diag_pre_multiply(L_sigma, L_Omega));
}

1 Like

I could write something up, though I wrote up a case study last year and after writing it I haven’t heard much since then.

The individual regressions approach is really straightforward when you think about it in state space terms. In Econometrics, they have this concept of a structural VAR. If you remove the VAR aspect, then it kind of flows from if you assume a specific decomposition of the covariance matrix. Here (161.5 KB) is an incomplete draft of some notes I wrote up on it last year (before getting distracted with other things).

I think vines with non-normal marginals and normal copulas would work fine since the tricky part is the conditional CDF and the marginals are all converted to cumulative CDFs. I make no promises about non-normal copulas. That being said, I would need to do some tests to make sure I am right about this. A write-up could incorporate this.

In terms of other materials to understand them. Vine copulas certainly weren’t an easy subject for me to grok. I came back to them on and off a few times before I really understood what was going on (and I’m still no expert). This presentation (http://www.columbia.edu/~rf2283/Conference/2Models%20(1)%20Seagers.pdf) covers vines at the end in a pretty clear manner that helped me understand.

With respect to your example, I was going to try that, but I got lazy. It certainly would be what I would want to do in any write-up. The problem ultimately is that multi_normal and multi_normal_cholesky don’t scale well, even if you vectorize. Being able to show that in a write-up would be interesting.

2 Likes

The necessary matrix inversion in the multi-normal code really hurts scalability. In addition, I’ve seen issues with high dimensional construction of the Cholesky factor which is fixed by using the onion method.

I think it would be cool to generalize the regression approach for any dimension with a function. It would be really useful!

I found a master’s thesis (https://mediatum.ub.tum.de/doc/1694263/document.pdf) which did vine copulas in Stan but the relevant Stan functions are missing from the doc. Gives a good overview of vines and how to use them.

In the rvinecopulib they reference Aas, K., C. Czado, A. Frigessi, and H. Bakken (2009). Pair-copula constructions of multiple dependence. Insurance: Mathematics and Economics 44 (2), 182-198. There’s a bunch of conditional functions, they call h functions in the appendix for bivariate copulas.

1 Like

I think it is similar to the onion method. I might have a generalized version of the regression approach somewhere. I’ll need to look for it.

The papers look to be good references. Glad to see I was able to derive the Gaussian h-function on my own. This makes it a bit clearer that the only difference for the joint distribution if you have different marginal distributions would be that you need a step to calculate U for each and then get Uinv=inv_Phi(U). Everything else should be the same as what I have if you’re using bivariate Gaussian copulae.

1 Like

When I vectorize everything, the elapsed times look like below. fit_1 is the multi_normal, fit_2 is individual regressions, fit_3 uses partial correlations, fit_4 uses copulas. It doesn’t really change the conclusions, just the magnitudes of everything

> mean(get_elapsed_time(fit_1)[, 1] +  get_elapsed_time(fit_1)[, 2])
[1] 4.48875
> mean(get_elapsed_time(fit_2)[, 1] +  get_elapsed_time(fit_2)[, 2])
[1] 0.4755
> mean(get_elapsed_time(fit_3)[, 1] +  get_elapsed_time(fit_3)[, 2])
[1] 8.46725
> mean(get_elapsed_time(fit_4)[, 1] +  get_elapsed_time(fit_4)[, 2])
[1] 2.272
1 Like

This is a version of the multivariate normal with individual regressions above that works for any size input. It’s a little bit faster than the one above too, for some reason. I think I had an earlier forum post where I had a version of this that was fancier (included QR decompositions and normalized all the variables). This one is less complicated so it’s a bit easier to understand (the other one might have a typo in it too, but this produces pretty much the exact same mean/covariance as the other ones).

data {
    int<lower=0> T;
    int<lower=0> N;
    vector[T] X[N];
}
transformed data {
    matrix[T, N] X_mat;
    int k[N];
    
    X_mat[, 1] = X[1];
    k[1] = 0;
    for (i in 2:N) {
        k[i] = k[i - 1] + (i - 1);
        X_mat[, i] = X[i];
    }
}
parameters {
    vector[N] alphas;
    vector[k[N]] betas;
    vector<lower=0>[N] sigmas;
}
model {
    // priors
    alphas[1] ~ normal(0.15, 0.15);
    alphas[2:N] ~ normal(0, 0.15);
    betas ~ normal(1, 1);
    sigmas ~ cauchy(0, 0.3);

    X[1] ~ normal(alphas[1], sigmas[1]);
    {
        for (i in 2:N) {
            X[i] ~ normal(alphas[i] + X_mat[, 1:(i - 1)] * betas[(k[i - 1] + 1):k[i]], sigmas[i]);
        }
    }
}
generated quantities {
    vector[N] mu;
    matrix[N, N] Sigma;
    {
        matrix[N, N] B;
        matrix[N, N] L;
        for (i in 1:N) {
            for (j in 1:N) {
                if (j >= i) {
                    B[i, j] = 0;
                } else {
                    B[i, j] = betas[k[i - 1] + j];
                }
            }
        }
        L = -B;
        for (i in 1:N) {
            L[i, i] = L[i, i] + 1;
        }
        L = inverse(L);
        mu = L * alphas;
        Sigma = multiply_lower_tri_self_transpose(diag_post_multiply(L, sigmas));
    }
}
5 Likes

Thanks John and very cool stuff! I’m going to test this out in my programs. Can you wrap it in a function for easier reuse?

I’m curious how this may help with multivariate priors and other things that rely upon the multivariate normal.

I wonder if the multivariate student t may be able to be expressed similarly but would need to go through the math to verify.

1 Like

See below. I’m still using rstan 2.21, so I don’t have access to the to_int function, which makes some things a little more awkward (I vaguely recall being able to cast before, but I wasn’t able to get it working this time around).

I don’t precisely know for sure on Student t either.

functions {
    real multi_normal_indv_regressions_lpdf(vector x, vector alphas, vector betas, vector sigmas) {
        int N = num_elements(x);
        real output = 0;
        int k_prior = 0;
        int k_current = 0;

        output = output + normal_lpdf(x[1] | alphas[1], sigmas[1]);
        for (i in 2:N) {
            k_current = k_prior + i - 1;
            output = output + normal_lpdf(x[i] | alphas[i] + dot_product(x[1:(i - 1)], betas[(k_prior + 1):k_current]), sigmas[i]);
            k_prior = k_current;
        }
        return output;
    }
    real multi_normal_indv_regressions_matrix_lpdf(matrix X, vector alphas, vector betas, vector sigmas) {
        int N = num_elements(alphas);
        real output = 0;
        int k_prior = 0;
        int k_current = 0;

        output = output + normal_lpdf(X[, 1] | alphas[1], sigmas[1]);
        for (i in 2:N) {
            k_current = k_prior + i - 1;
            output = output + normal_lpdf(X[, i] | alphas[i] + X[, 1:(i - 1)] * betas[(k_prior + 1):k_current], sigmas[i]);
            k_prior = k_current;
        }
        return output;
    }
    matrix convertCoefficientVectorToCholesky(vector betas, int N) {
        //int N = to_int(0.5 * (sqrt(1 + 8 * num_elements(betas)) - 1)); // using quadratic formula to solve 0.5*x*(x+1)=num_elements(x)
        matrix[N, N] B;
        matrix[N, N] L;
        int k_prior = 0;
        int k_current = 0;

        for (i in 1:N) {
            k_current = k_prior + i - 1;
            for (j in 1:N) {
                if (j >= i) {
                    B[i, j] = 0;
                } else {
                    B[i, j] = betas[k_prior + j];
                }
            }
            k_prior = k_current;
        }
        L = -B;
        for (i in 1:N) {
            L[i, i] = L[i, i] + 1;
        }
        return inverse(L);
    }
}
data {
    int<lower=0> T;
    int<lower=0> N;
    matrix[T, N] X;
}
transformed data {
    int k[N];

    k[1] = 0;
    for (i in 2:N) {
        k[i] = k[i - 1] + (i - 1);
    }
}
parameters {
    vector[N] alphas;
    vector[k[N]] betas;
    vector<lower=0>[N] sigmas;
}
model {
    // priors
    alphas[1] ~ normal(0.15, 0.15);
    alphas[2:N] ~ normal(0, 0.15);
    betas ~ normal(1, 1);
    sigmas ~ cauchy(0, 0.3);

    X ~ multi_normal_indv_regressions_matrix(alphas, betas, sigmas);
}
generated quantities {
    vector[N] mu;
    matrix[N, N] Sigma;
    {
        matrix[N, N] L;
        L = convertCoefficientVectorToCholesky(betas, N);
        mu = L * alphas;
        Sigma = multiply_lower_tri_self_transpose(diag_post_multiply(L, sigmas));
    }
}
2 Likes

@jmh530 this is awesome!

I have two seemingly ‘simple’ questions in the grand scheme of things:

  1. Is there any reason to use the regression approach over the C Vine or vice versa? I particular ask in reference to the ability to learn the dependence structure underlying 54 variables. I have fit a Gaussian copula, but you mentioned the vine is more feasible in higher dimensions…

  2. Given that, do you know of a way to generalize your c vine code to 54 traits… i suppose im asking in terms of the most efficient way to do so in Stan…

What the C vine gets you is flexibility. You can choose each pair copula and all the marginals to be whatever distribution you want. The regression approach you could probably make alternate assumptions about the residuals (using a t-distribution for instance), but then you would no longer be able to do an analytic covariance matrix. You also are enforcing a linear relationship between the variables without any transformations that the C vine approach would allow. So if your data is approximately normal and linear, the regression approach could be a good starting point, but that might not be a good fit for everyone.

In terms of generalizing it, it shouldn’t be hard to generalize, I just haven’t gotten around to it. The problem is that it is much easier to express models in Stan when you know in advance exactly what the distributions you want to fit are. Some of the frequentist implementations of vines will have an option to fit multiple different dependence structures and then go with the best ones. To do something like that with Stan would require fitting lots of different models and if your data is large then it might take some time to do that. Being able to simplify the problem somehow will usually help in terms of running time when doing stuff with Stan. It’s usually better to start with a simple model and then make more complicated.