Would you be able to write up a C-vine example with 3 marginals in Stan?

Yes, I think I will.

Iâ€™ve attached some files (you can save to a folder and just adjust the working directory in the file that starts with â€śS_â€ť) to compare a few different ways of fitting multivariate normals in Stan: 1) Stanâ€™s multi_normal_cholesky, 2) an individual regression approach Iâ€™ve discussed on the forums previously, 3) a partial correlation approach that uses multi_normal, and 4) the C-vine copula approach (which is also copied below and borrows from your copula code).

The true correlation matrix is set up so that the partial correlation of variables 2 and 3 given variable 1 is zero.

Iâ€™m using slightly different priors, so they arenâ€™t exactly comparable, but they all give more-or-less the same results. The individual regressions approach is the fastest (chains run around 0.5s or so on my machine with my settings) and results in the highest effective samples. I directly calculated the covariance matrix, instead of inverting a matrix which probably works better when you get more complicated. The vine approach is the second fastest (chains running around 1.5-2s). The multinormal cholesky chains take around 5-10s and the partial correlation version around 20s (I did this one since the output is comparable to the Cvine and the difference in the run-time reflects the slowdown as a result of using the multi-normal instead of the bivariate).

Anyway, the C-vine example below is specifically set up for normal marginals and a normal copula. Things get a little more complicated when you move away from this (I start getting a bit beyond my skis, but I found this explanation pretty good: https://tnagler.github.io/vine-arisa.pdf).

In the more general case, the copula here would take the conditional CDFs of the variables, so it looks something like c_23|1(F_2|1(x2|x1),F_3|1(x3|x1)). F_2|1(x2|x1) is the conditional CDF and is calculated by taking the derivative with respect to x1 of the c12 copula while fixing F_1(x1) values. It simplifies nicely for Gaussians, but not more generally as you could imagine.

There is an R package VineCopula that looks good, though it looks the bleeding edge is a C++ implementation called vinecopulib (with R/Python interfaces).

```
functions {
real normal_copula_vector_lpdf(vector u, vector v, real rho) {
int N = num_elements(u);
real rho_sq = square(rho);
real a1 = 0.5 * rho;
real a2 = rho_sq - 1;
real a3 = 0.5 * log1m(rho_sq);
real x = -2 * u' * v + rho * (dot_self(u) + dot_self(v));
return a1 * x / a2 - N * a3;
}
}
data {
int<lower=0> T;
matrix[T, 3] X;
}
transformed data {
int N = 3;
}
parameters {
vector[N] mu;
vector<lower=0>[N] sigma;
real<lower=0,upper=1> p1_raw;
real<lower=0,upper=1> p12_raw;
real<lower=0,upper=1> p13_raw;
real<lower=0,upper=1> p23_given_1_raw;
}
transformed parameters {
real<lower=-1,upper=1> p12;
real<lower=-1,upper=1> p13;
real<lower=-1,upper=1> p23_given_1;
p12 = (p12_raw - 0.5) * 2;
p13 = (p13_raw - 0.5) * 2;
p23_given_1 = (p23_given_1_raw - 0.5) * 2;
}
model {
// priors
mu ~ normal(0.15, 0.15);
sigma ~ cauchy(0, 0.3);
p1_raw ~ beta(19, 1);
p12_raw ~ normal(p1_raw, 0.1);
p13_raw ~ normal(p1_raw, 0.1);
p23_given_1_raw ~ beta(1, 1);
// univariate
for (i in 1:3) {
X[, i] ~ normal(mu[i], sigma[i]);
}
{
matrix[T, N] Uinv;
matrix[T, N - 1] Uinv_2nd_level;
for (i in 1:N) {
// Since assuming normal marginals:
// U = Phi((x - mu) / sigma), and
// U_inv = inv_Phi(U) = (x - mu) / sigma
Uinv[, i] = (X[, i] - mu[i]) / sigma[i];
}
// fit first level
Uinv[, 2] ~ normal_copula_vector(Uinv[, 1], p12);
Uinv[, 3] ~ normal_copula_vector(Uinv[, 1], p13);
// fit second level
// The general appraoch is to get each conditional CDF. This requires
// taking the derivative of the copula CDF when holding one variable
// constant. Things are simpler here because I'm using normal marginals
// and copulae, but this doesn't hold generally. What I'm doing here
// is essentially regressing variables 2 and 3 on variable 1 and
// and dividing by the residual standard deviation. Since we have
// standardized the data above, it simplifies the calculations here
// and I don't need to do the regression because the correlations are
// re-used from above. Works for normals, but not more generally.
Uinv_2nd_level[, 1] = (Uinv[, 2] - p12 * Uinv[, 1]) / sqrt(1 - square(p12));
Uinv_2nd_level[, 2] = (Uinv[, 3] - p13 * Uinv[, 1]) / sqrt(1 - square(p13));
Uinv_2nd_level[, 2] ~ normal_copula_vector(Uinv_2nd_level[, 1], p23_given_1);
}
}
generated quantities {
real p23;
p23 = p12 * p13 + p23_given_1 * sqrt(1 - square(p12)) * sqrt(1 - square(p13));
}
```

S_Simple_Cvine_Copula.r (4.6 KB)

simple_cvine_copula.stan (2.8 KB)

simple_multinormal.stan (675 Bytes)

simple_multinormal_indv_regressions.stan (1.5 KB)

simple_multinormal_partials.stan (1.3 KB)

Thanks for putting this together! Itâ€™s nice to see these compared, probably a really nice overview paper or case study. Do you have any literature for the individual regressions approach? Seeing the math would be helpful for generalization, though it must just follow from the conditional normal.

Iâ€™m interested in non-normal marginals for the vine, do you have any examples using parametric bivariate constructors with non-normal marginals? The reference you provided is good. If I wasnâ€™t so busy (or, maybe, lazy :)) Iâ€™d work out a few conditional copulaes from it. Iâ€™d appreciate anything you may have to help get started here.

As for the speed of the partials and multinormal, I believe youâ€™ll get a more fair comparison if you vectorize over `T`

. The derivatives are really slowing down because theyâ€™re being calculated on each row of data instead of 1 time for the entire vector. I didnâ€™t test the code but writing the `simple_multinormal.stan`

with an array of vectors for the data (I also assigned directly to objects which is ever so slightly faster) allows the removal of the T loop. Curious how this changes the timing on your machine. The same update may be applied to the `simple_multinormal_partials.stan`

code.

```
data {
int<lower=0> T;
array[T] vector[3] X;
}
transformed data {
int N = 3;
}
parameters {
vector[N] mu;
vector<lower=0>[N] L_sigma;
cholesky_factor_corr[N] L_Omega;
}
model {
// priors
mu ~ normal(0.15, 0.15);
L_sigma ~ cauchy(0, 0.3);
L_Omega ~ lkj_corr_cholesky(1);
// model
{
matrix[N, N] L_Sigma = diag_pre_multiply(L_sigma, L_Omega);
X ~ multi_normal_cholesky(mu, L_Sigma);
}
}
generated quantities {
matrix[N, N] Sigma = multiply_lower_tri_self_transpose(diag_pre_multiply(L_sigma, L_Omega));
}
```

I could write something up, though I wrote up a case study last year and after writing it I havenâ€™t heard much since then.

The individual regressions approach is really straightforward when you think about it in state space terms. In Econometrics, they have this concept of a structural VAR. If you remove the VAR aspect, then it kind of flows from if you assume a specific decomposition of the covariance matrix. Here (161.5 KB) is an incomplete draft of some notes I wrote up on it last year (before getting distracted with other things).

I think vines with non-normal marginals and normal copulas would work fine since the tricky part is the conditional CDF and the marginals are all converted to cumulative CDFs. I make no promises about non-normal copulas. That being said, I would need to do some tests to make sure I am right about this. A write-up could incorporate this.

In terms of other materials to understand them. Vine copulas certainly werenâ€™t an easy subject for me to grok. I came back to them on and off a few times before I really understood what was going on (and Iâ€™m still no expert). This presentation (http://www.columbia.edu/~rf2283/Conference/2Models%20(1)%20Seagers.pdf) covers vines at the end in a pretty clear manner that helped me understand.

With respect to your example, I was going to try that, but I got lazy. It certainly would be what I would want to do in any write-up. The problem ultimately is that `multi_normal`

and `multi_normal_cholesky`

donâ€™t scale well, even if you vectorize. Being able to show that in a write-up would be interesting.

The necessary matrix inversion in the multi-normal code really hurts scalability. In addition, Iâ€™ve seen issues with high dimensional construction of the Cholesky factor which is fixed by using the onion method.

I think it would be cool to generalize the regression approach for any dimension with a function. It would be really useful!

I found a masterâ€™s thesis (https://mediatum.ub.tum.de/doc/1694263/document.pdf) which did vine copulas in Stan but the relevant Stan functions are missing from the doc. Gives a good overview of vines and how to use them.

In the `rvinecopulib`

they reference Aas, K., C. Czado, A. Frigessi, and H. Bakken (2009). Pair-copula constructions of multiple dependence. Insurance: Mathematics and Economics 44 (2), 182-198. Thereâ€™s a bunch of conditional functions, they call `h`

functions in the appendix for bivariate copulas.

I think it is similar to the onion method. I might have a generalized version of the regression approach somewhere. Iâ€™ll need to look for it.

The papers look to be good references. Glad to see I was able to derive the Gaussian h-function on my own. This makes it a bit clearer that the only difference for the joint distribution if you have different marginal distributions would be that you need a step to calculate U for each and then get `Uinv=inv_Phi(U)`

. Everything else should be the same as what I have if youâ€™re using bivariate Gaussian copulae.

When I vectorize everything, the elapsed times look like below. `fit_1`

is the multi_normal, `fit_2`

is individual regressions, `fit_3`

uses partial correlations, `fit_4`

uses copulas. It doesnâ€™t really change the conclusions, just the magnitudes of everything

```
> mean(get_elapsed_time(fit_1)[, 1] + get_elapsed_time(fit_1)[, 2])
[1] 4.48875
> mean(get_elapsed_time(fit_2)[, 1] + get_elapsed_time(fit_2)[, 2])
[1] 0.4755
> mean(get_elapsed_time(fit_3)[, 1] + get_elapsed_time(fit_3)[, 2])
[1] 8.46725
> mean(get_elapsed_time(fit_4)[, 1] + get_elapsed_time(fit_4)[, 2])
[1] 2.272
```

This is a version of the multivariate normal with individual regressions above that works for any size input. Itâ€™s a little bit faster than the one above too, for some reason. I think I had an earlier forum post where I had a version of this that was fancier (included QR decompositions and normalized all the variables). This one is less complicated so itâ€™s a bit easier to understand (the other one might have a typo in it too, but this produces pretty much the exact same mean/covariance as the other ones).

```
data {
int<lower=0> T;
int<lower=0> N;
vector[T] X[N];
}
transformed data {
matrix[T, N] X_mat;
int k[N];
X_mat[, 1] = X[1];
k[1] = 0;
for (i in 2:N) {
k[i] = k[i - 1] + (i - 1);
X_mat[, i] = X[i];
}
}
parameters {
vector[N] alphas;
vector[k[N]] betas;
vector<lower=0>[N] sigmas;
}
model {
// priors
alphas[1] ~ normal(0.15, 0.15);
alphas[2:N] ~ normal(0, 0.15);
betas ~ normal(1, 1);
sigmas ~ cauchy(0, 0.3);
X[1] ~ normal(alphas[1], sigmas[1]);
{
for (i in 2:N) {
X[i] ~ normal(alphas[i] + X_mat[, 1:(i - 1)] * betas[(k[i - 1] + 1):k[i]], sigmas[i]);
}
}
}
generated quantities {
vector[N] mu;
matrix[N, N] Sigma;
{
matrix[N, N] B;
matrix[N, N] L;
for (i in 1:N) {
for (j in 1:N) {
if (j >= i) {
B[i, j] = 0;
} else {
B[i, j] = betas[k[i - 1] + j];
}
}
}
L = -B;
for (i in 1:N) {
L[i, i] = L[i, i] + 1;
}
L = inverse(L);
mu = L * alphas;
Sigma = multiply_lower_tri_self_transpose(diag_post_multiply(L, sigmas));
}
}
```

Thanks John and very cool stuff! Iâ€™m going to test this out in my programs. Can you wrap it in a function for easier reuse?

Iâ€™m curious how this may help with multivariate priors and other things that rely upon the multivariate normal.

I wonder if the multivariate student t may be able to be expressed similarly but would need to go through the math to verify.

See below. Iâ€™m still using rstan 2.21, so I donâ€™t have access to the to_int function, which makes some things a little more awkward (I vaguely recall being able to cast before, but I wasnâ€™t able to get it working this time around).

I donâ€™t precisely know for sure on Student t either.

```
functions {
real multi_normal_indv_regressions_lpdf(vector x, vector alphas, vector betas, vector sigmas) {
int N = num_elements(x);
real output = 0;
int k_prior = 0;
int k_current = 0;
output = output + normal_lpdf(x[1] | alphas[1], sigmas[1]);
for (i in 2:N) {
k_current = k_prior + i - 1;
output = output + normal_lpdf(x[i] | alphas[i] + dot_product(x[1:(i - 1)], betas[(k_prior + 1):k_current]), sigmas[i]);
k_prior = k_current;
}
return output;
}
real multi_normal_indv_regressions_matrix_lpdf(matrix X, vector alphas, vector betas, vector sigmas) {
int N = num_elements(alphas);
real output = 0;
int k_prior = 0;
int k_current = 0;
output = output + normal_lpdf(X[, 1] | alphas[1], sigmas[1]);
for (i in 2:N) {
k_current = k_prior + i - 1;
output = output + normal_lpdf(X[, i] | alphas[i] + X[, 1:(i - 1)] * betas[(k_prior + 1):k_current], sigmas[i]);
k_prior = k_current;
}
return output;
}
matrix convertCoefficientVectorToCholesky(vector betas, int N) {
//int N = to_int(0.5 * (sqrt(1 + 8 * num_elements(betas)) - 1)); // using quadratic formula to solve 0.5*x*(x+1)=num_elements(x)
matrix[N, N] B;
matrix[N, N] L;
int k_prior = 0;
int k_current = 0;
for (i in 1:N) {
k_current = k_prior + i - 1;
for (j in 1:N) {
if (j >= i) {
B[i, j] = 0;
} else {
B[i, j] = betas[k_prior + j];
}
}
k_prior = k_current;
}
L = -B;
for (i in 1:N) {
L[i, i] = L[i, i] + 1;
}
return inverse(L);
}
}
data {
int<lower=0> T;
int<lower=0> N;
matrix[T, N] X;
}
transformed data {
int k[N];
k[1] = 0;
for (i in 2:N) {
k[i] = k[i - 1] + (i - 1);
}
}
parameters {
vector[N] alphas;
vector[k[N]] betas;
vector<lower=0>[N] sigmas;
}
model {
// priors
alphas[1] ~ normal(0.15, 0.15);
alphas[2:N] ~ normal(0, 0.15);
betas ~ normal(1, 1);
sigmas ~ cauchy(0, 0.3);
X ~ multi_normal_indv_regressions_matrix(alphas, betas, sigmas);
}
generated quantities {
vector[N] mu;
matrix[N, N] Sigma;
{
matrix[N, N] L;
L = convertCoefficientVectorToCholesky(betas, N);
mu = L * alphas;
Sigma = multiply_lower_tri_self_transpose(diag_post_multiply(L, sigmas));
}
}
```