Gaussian copula for spherical data

Hi all,

I would like to work with data that are distributed on an hypersphere of arbitrary dimension. As a first step towards that, I’m starting to familiarise with syntetic data on a simple 3D sphere.

While distributions like the von Mises one have generalisations to higher dimensions, the normalising constant is a pain, so I was looking into using a gaussian copula to account for dependence along the dimensions.

I’ts the first time I approach a copula model, and I have been following this example. I generate a sample of 100 or 1000 observations on the sphere, the angles look fine, the data are constrained to the correct interval etc.

The parameters I am setting right now look like this:

\mu = [0.6\pi, 0.3\pi]
\kappa = [18,23]
\rho = -0.5

The issue is, no matter what I do, I always get wildly incorrect posteriors with massive convergence issues (high R-hat, low ESS, E-BFMI non existent, max-treedepth, but basically no divergent transitions). I already switched from the pdf to the log-pdf of the von Mises and standard normal distributions and that somewhat reduced the amount of warnings I get during sampling, but I still get values out of the [0,1] interval occasionally.

I have tried to use more informative priors on the location and scale parameters, or even to fix either of the two to the true value, to no avail. I also tried the trick of switching to a normal distribution for \kappa > 100, but it didn’t make a difference. I suspect that the issue could be arising from numerical errors more than from the model specification per se, but I might be wrong.

If anyone has any tip on this I would be grateful!

Here is the model:

functions {  
  real gauss_copula_cholesky_lpdf(matrix u, matrix L) {
    array[rows(u)] row_vector[cols(u)] q;
    for (n in 1:rows(u)) {
      q[n] = std_normal_log_qf(u[n]);
    }

    return multi_normal_cholesky_lpdf(q | rep_row_vector(0, cols(L)), L)
            - std_normal_lpdf(to_vector(to_matrix(q)));
  }
}

data {
  int<lower=0> N;
  int<lower=0> K;
  matrix[N, K] Y;
}

parameters {
  vector<lower=0>[K] kappa;
  vector<lower=-pi(), upper=pi()>[K] mu;
  cholesky_factor_corr[K] rho_chol;
}

model {

  rho_chol ~ lkj_corr_cholesky(2.0);
  kappa ~ exponential(0.05);
  
  mu ~ von_mises(0.0, 0.0);
  
  matrix[N, K] u;

  for (n in 1:N) {
    for (k in 1:K){
       u[n, k] = von_mises_lcdf(Y[n, k] | mu[k], kappa[k]);
    }
  }
  
  for (k in 1:K){
    Y[,k] ~ von_mises(mu[k], kappa[k]);
  }

  u ~ gauss_copula_cholesky(rho_chol);
}

generated quantities {
  matrix[K,K] rho = multiply_lower_tri_self_transpose(rho_chol);
}

Hi, @carlo. Sorry this didn’t get answered and I’m afraid I don’t know the answer, either. This is pretty esoteric combination of geometry and statistics and as far as I know, the Stan team doesn’t have any experts in this.

The general issue with hyper spherical models that it’s tricky to define the right kind of posterior means. I would start with looking there, because the simple means that Stan does on parameters is not going to give you what you want. The problem with the way you’ve defined mu is that the mass can wrap around pi() and wind up looking bimodal to Stan. You can mitigate this somewhat by using a unit vector for the mean to ensure it lives on the unit (K-1)-sphere.

unit_vector[K] mu;

I also think you need to find a 2D von Mises rather than just taking all the marginals to be von Moses. Also, doesn’t the second parameter for the von Mises need to be strictly greater than 0?

With exponential(0.05) you will be getting a prior expectation of 20 for kappa—is that realistic?

I’m afraid I don’t know anything about copulas or what might be going wrong there.

P.S. It’s also tricky to talk about dimensionality of spheres. The sphere that’s embedded in R^3 and defined by x^2 + y^2 + z^2 = 1 forms a 2-dimensional manifold.

1 Like

Hi @Bob_Carpenter, thank you for answering! I’m aware this is somewhat of a weird combination of problems, I did not expect a ton of feedback, but it was worth a try.

Thank you for the tip about using the unit vector, I’ll try that.

The point of using a gaussian copula model would be exactly to sidestep the need to define higher-D von Mises distributions, since past 2-3Ds it becomes very difficult to compute the normalisation.

The prior for kappa makes sense to me, the true values in the simulation are 18 and 23, and in real data I expect even higher values of concentration.

Regarding the sphere, yes, at the moment I’m working with a sphere in 3D coordinates, but I then convert those into spherical coordinates, i.e. each point becomes a vector identified by two angles.

In general I know it should be possible to do something like this, I’m basically trying to emulate a recent example (https://arxiv.org/abs/2406.04085), just in a Bayesian framework.

While reading the paper, I encountered complex-valued probability density functions (pdfs). Although I am not very familiar with complex pdfs, I recently implemented one, and it works well.
The copula specified is just the normal copula.

Take a look at:
https://mc-stan.org/docs/stan-users-guide/complex-numbers.html#dependent-complex-error
we have an example how to access real and complex values and how to fit with a multinormal
distribution.
Isn’t this solution simply defining a 2K matrix (where K is the number of PDFs) and then proceeding with copulas as usual in Stan?

Hi! Thank you for the suggestion, however I’m a bit confused, if you are referring to \mathbb{C}, as far as I understood, that is the notation the authors use for the unit circle.

True, it’s not complex, it’s the unit circle. I assumed they use complex valued pdfs.

This may be interesting too:

But can you try this function:

real multi_normal_cholesky_copula_lpdf(matrix U, matrix L) {
    int N = rows(U); // total number of observations
    int J = cols(U); // total number of responses (J+K)
    matrix[J, J] Gammainv = chol2inv(L);
    return -N * sum(log(diagonal(L))) - 0.5 * sum(add_diag(Gammainv, -1.0) .* crossprod(U));
  }
functions {  
real multi_normal_cholesky_copula_lpdf(matrix U, matrix L) {
    int N = rows(U); // total number of observations
    int J = cols(U); // total number of responses (J+K)
    matrix[J, J] Gammainv = chol2inv(L);
    return -N * sum(log(diagonal(L))) - 0.5 * sum(add_diag(Gammainv, -1.0) .* crossprod(U));
  }
}

data {
  int<lower=0> N;
  int<lower=0> K;
  matrix[N, K] Y;
}

parameters {
  vector<lower=0>[K] kappa;
  vector<lower=-pi(), upper=pi()>[K] mu;
  cholesky_factor_corr[K] rho_chol;
}

model {

  rho_chol ~ lkj_corr_cholesky(2.0);
  kappa ~ exponential(0.05);
  
  mu ~ von_mises(0.0, 0.0);
  
  matrix[N, K] u;

  for (n in 1:N) {
    for (k in 1:K){
       u[n, k] = std_normal_log_qf(von_mises_lcdf(Y[n, k] | mu[k], kappa[k]));
    }
  }
  
  for (k in 1:K){
    Y[,k] ~ von_mises(mu[k], kappa[k]);
  }

  u ~ multi_normal_cholesky_copula(rho_chol);
}

generated quantities {
  matrix[K,K] rho = multiply_lower_tri_self_transpose(rho_chol);
}

Just an fyi that the two likelihoods are the same, the only difference here is the use of the _log_qf and _lcdf instead of the _cdf directly. Additionally, this expression of likelihood is less numerically stable and computationally efficient as it has to compute a matrix inverse as well as autodiff through multiple matrix expressions

We gotta add Gaussian copula to Stan!

1 Like

In case it helps, Stan doesn’t need normalized densities for sampling.

There are a lot of ways to define normals over complex numbers depending on how you handle the real and imaginary components and any possible correlation. And there’s definitely a relation between circular stats and complex numbers.

That’d be great. Then I could learn how to use copulas :-).