LDLT_Factor of random variable is not positive definite

Hi everyone,

I am trying to fit a Gaussian Process with a Covariance function specified by a prior Wishart distribution.
I am simulating some random data in R and then fit the model. I always get the error:
LDLT_Factor of random variable is not positive definite.
which is weird because the Covariance Matrix should be positive definite and symmetric by definition.

Here the full code:

library(rstan)
library(fields)

set.seed(999)

options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

N <- 100

x <- seq(0, 1, length.out = sqrt(N))

index <- structure(as.matrix(expand.grid(x, x)))

K <- Exponential(as.matrix(dist(index)), alpha = .5)

r_w <- rWishart(N, N, K)[,,1]
rmv <- drop(rmvnorm(n = 1, mu = 0, Sigma = r_w))


fit <-
  stan(
    file = "./example1.stan",
    data = list(N = N,
                index = index,
                y_obs = rmv),
    chains = 4,
    iter = 1000
  )

and the relative stan code:

data {
  int<lower=1> N;
  array[N] vector[2] index;
  vector[N] y_obs;
}


parameters{
  real<lower=0> sigma;
  real<lower=0> length_scale;
  cov_matrix[N] KW;
}

transformed parameters{
  cov_matrix[N] K = gp_exponential_cov(index, sigma, length_scale) + diag_matrix(rep_vector(1e-9, N));
}

model{
  sigma ~ gamma(1, 1);
  length_scale ~ gamma(1, 1);
  KW ~ wishart(N, K);
  
  y_obs ~ multi_normal(rep_vector(0, N), KW);
}

Is there anything I am doing wrong?

Thank you,
Marco

Unfortunately, large covariance matrices have initialization problems. I believe if you try this with N < 30 then it will work. Once you get over a dimension of ~30 then you will start to see this error.

You can use the Using the onion method in occupancy model hierarchical model - #2 by spinkney.

This will give you a cholesky factor that you can multiply by the transpose of itself to get a corr mat. Then multiply by the vector of variances to get a cov mat.

Or you can use the wishart_cholesky directly

functions {
matrix onion_cholesky_lp(
		       int N,
                       real eta,
		       vector r_2,
		       row_vector l_omega) {       
  {
    matrix[n_pred, n_pred] L_Omega = rep_matrix(0, N, N);
    real alpha = eta + (N - 2.0) / 2.0;
    vector[N - 1] shape1;
    vector[N - 1] shape2;
 
     shape1[1] = alpha;
     shape2[1] = alpha;

    int start = 1;
    int end = 2;

    L_Omega[1,1] = 1.0;
    L_Omega[2,1] = 2.0 * r_2[1] - 1.0;
    L_Omega[2,2] = sqrt(1.0 - square(L_Omega[2,1]));
    for (k in 2:N - 1) {
      int kp1 = k + 1;
      row_vector[k] l_row = segment(l_omega, start, k);
      real scale = sqrt(r_2[k] / dot_self(l_row));
      L_Omega[kp1, 1:k] = l_row[1:k] * scale;
      L_Omega[kp1, kp1] = sqrt(1.0 - r_2[k]);
      
     alpha -= 0.5;
    shape1[k] = k / 2.0;
    shape2[k] = alpha;

      start = end + 1;
      end = start + k - 1;
    }

    target += beta_lpdf(r_2, shape1, shape2);

    return L_Omega;
  }
}
}
data {
  int<lower=1> N;
  array[N] vector[2] index;
  vector[N] y_obs;
  real<lower=0> eta;
}


parameters{
  real<lower=0> sigma;
  real<lower=0> length_scale;
 row_vector[choose(N, 2) - 1]  l_omega_p;
vector<lower = 0,upper = 1>[N - 1] R2;
 // cov_matrix[N] KW;
}

transformed parameters{
  matrix[N, N] L_KW = onion_cholesky(N, eta, R2, l_omega_p);
  matrix[N, N] L_K = cholesky_decompose(gp_exponential_cov(index, sigma, length_scale) + diag_matrix(rep_vector(1e-9, N)));
}

model{
  sigma ~ gamma(1, 1);
  length_scale ~ gamma(1, 1);
  L_KW ~ wishart_cholesky(N, L_K);
  
  y_obs ~ multi_normal_cholesky(rep_vector(0, N), L_KW);
}

Thank you very much for your answer,
One last question, you wrote:

  L_KW ~ wishart_cholesky(N, L_K);
  
  y_obs ~ multi_normal_cholesky(rep_vector(0, N), L_KW);

should it be this instead?:

  L_KW ~ wishart_cholesky(N, L_K);
  
  y_obs ~ multi_normal_cholesky(rep_vector(0, N), cholesky_decompose(L_KW));

Thank you,
Marco

No, L_KW is already a cholesky_factor. A few additional things. I updated so that the method outputs a uniform cholesky factor of correlation matrices. You can then apply another prior on top. Then, to get a cholesky factor of a covariance matrix you need to multiply by a vector of standard deviations.
Lastly, the inverse wishart is what is the proper distribution for a covariance matrix. If you want to model the inverse correlation matrix, aka a precision matrix, then use the wishart.

functions {
matrix onion_cholesky_lp(
        int N,
        vector r_2,
        row_vector l_omega,
        data vector shape1,
        data vector shape2) {       
  {
    matrix[n_pred, n_pred] L_Omega = rep_matrix(0, N, N);

    int start = 1;
    int end = 2;

    L_Omega[1,1] = 1.0;
    L_Omega[2,1] = 2.0 * r_2[1] - 1.0;
    L_Omega[2,2] = sqrt(1.0 - square(L_Omega[2,1]));
    for (k in 2:N - 1) {
      int kp1 = k + 1;
      row_vector[k] l_row = segment(l_omega, start, k);
      real scale = sqrt(r_2[k] / dot_self(l_row));
      L_Omega[kp1, 1:k] = l_row[1:k] * scale;
      L_Omega[kp1, kp1] = sqrt(1.0 - r_2[k]);
      start = end + 1;
      end = start + k - 1;
    } 
    target += std_normal_lpdf(l_omega);
    target += beta_lpdf(r_2, shape1, shape2);

    return L_Omega;
  }
}
}
data {
  int<lower=1> N;
  array[N] vector[2] index;
  vector[N] y_obs;
}
transformed data {
// let's make L a uniform distribution over cholesky corr factors
// impliles eta = 1;
  real<lower=0> eta = 1;
  real<lower = 0> alpha = eta + (K - 2) / 2.0;
  vector<lower = 0>[K-1] shape1;
  vector<lower = 0>[K-1] shape2;

  shape1[1] = alpha;
  shape2[1] = alpha;
  for(k in 2:(K-1)) {
    alpha = alpha - 0.5;
    shape1[k] = k / 2.0;
    shape2[k] = alpha;
  }
}

parameters{
  real<lower=0> sigma;
  real<lower=0> length_scale;
 row_vector[choose(N, 2) - 1]  l_omega_p;
vector<lower = 0,upper = 1>[N - 1] R2;
 vector<lower=0>[N] sds;
 // cov_matrix[N] KW;
}

transformed parameters{
  matrix[N, N] L_KW = onion_cholesky_lp(N, R2, l_omega_p, shape1, shape2);
  matrix[N, N] L_K = cholesky_decompose(gp_exponential_cov(index, sigma, length_scale) + diag_matrix(rep_vector(1e-9, N)));
}

model{
  sigma ~ gamma(1, 1);
  length_scale ~ gamma(1, 1);
  sds ~ exponential(1);
  L_KW ~ inv_wishart_cholesky(N, diag_pre_multiply(sds, L_K));
  y_obs ~ multi_normal_cholesky(rep_vector(0, N), L_KW);
}

Thank you so much for this. A few last tweaks.

In the function onion_cholesky_lp in your code there is the following:

...
matrix[n_pred, n_pred] L_Omega = rep_matrix(0, N, N);
...
beta_lpdf(r_2, shape1, shape2);

should they be like this instead?:

...
matrix[N, N] L_Omega = rep_matrix(0, N, N);
...
beta_lpdf(r_2 | shape1, shape2);

also, when you write:

  real<lower = 0> alpha = eta + (K - 2) / 2.0;
  vector<lower = 0>[K-1] shape1;
  vector<lower = 0>[K-1] shape2;

what is K? It is not defined anywhere else.

Thank you

Yes.

K should be N

Perfect. Thank you so much, everything seems to work now.
Just to clarify, the parameter L_KW which is distributed as a inv_wishart_cholesky represents the cholesky decomposition of the covariance matrix.
If, from the posterior distribution, I need the actual covariance matrix I can simply do crossprod(L_KW). Am I right?

Sorry, no, it should be

transformed parameters{
  matrix[N, N] L_KW = diag_pre_multiply(sds, onion_cholesky_lp(N, R2, l_omega_p, shape1, shape2));
  matrix[N, N] L_K = cholesky_decompose(gp_exponential_cov(index, sigma, length_scale) + diag_matrix(rep_vector(1e-9, N)));
model {
...
L_KW ~ inv_wishart_cholesky(N, L_K);
...
}
}

Now L_KW represents the cholesky decomposition of the covariance matrix. To get the covariance matrix it’s a bit faster to do multiply_lower_tri_self_transpose(L_KW)

We have a triangular form specialization,

matrix multiply_lower_tri_self_transpose(matrix x)

see: 6.5 Dot products and specialized products | Stan Functions Reference,

Stan doesn’t exploit conjugacy (Wishart/precision, inverse Wishart/covariance), so (inverse) Wisharts are not the only priors available for positive definite matrices. In our User’s Guide chapter on regression, we introduce the LKJ prior on correlation matrices which can shrink toward unit correlation matrix and then you can put an independent prior on scale.

Thank you all for helping me on this.
This is how the final code looks like

functions {
matrix onion_cholesky_lp(
        int N,
        vector r_2,
        row_vector l_omega,
        data vector shape1,
        data vector shape2) {       
  {
    matrix[N, N] L_Omega = rep_matrix(0, N, N);

    int start = 1;
    int end = 2;

    L_Omega[1,1] = 1.0;
    L_Omega[2,1] = 2.0 * r_2[1] - 1.0;
    L_Omega[2,2] = sqrt(1.0 - square(L_Omega[2,1]));
    for (k in 2:N - 1) {
      int kp1 = k + 1;
      row_vector[k] l_row = segment(l_omega, start, k);
      real scale = sqrt(r_2[k] / dot_self(l_row));
      L_Omega[kp1, 1:k] = l_row[1:k] * scale;
      L_Omega[kp1, kp1] = sqrt(1.0 - r_2[k]);
      start = end + 1;
      end = start + k - 1;
    } 
    target += std_normal_lpdf(l_omega);
    target += beta_lpdf(r_2 | shape1, shape2);

    return L_Omega;
  }
}
}

data {
  int<lower=1> N;
  array[N] vector[2] index;
  vector[N] y_obs;
}

transformed data {
// let's make L a uniform distribution over cholesky corr factors
// impliles eta = 1;
  real<lower=0> eta = 1;
  real<lower = 0> alpha = eta + (N - 2) / 2.0;
  vector<lower = 0>[N-1] shape1;
  vector<lower = 0>[N-1] shape2;

  shape1[1] = alpha;
  shape2[1] = alpha;
  for(k in 2:(N-1)) {
    alpha = alpha - 0.5;
    shape1[k] = k / 2.0;
    shape2[k] = alpha;
  }
}

parameters{
  real<lower=0> sigma;
  real<lower=0> length_scale;
  row_vector[choose(N, 2) - 1]  l_omega_p;
  vector<lower = 0,upper = 1>[N - 1] R2;
  vector<lower=0>[N] sds;
 // cov_matrix[N] KW;
}

transformed parameters{
  matrix[N, N] L_KW = diag_pre_multiply(sds, onion_cholesky_lp(N, R2, l_omega_p, shape1, shape2));
  matrix[N, N] L_K = cholesky_decompose(gp_exponential_cov(index, sigma, length_scale) + diag_matrix(rep_vector(1e-9, N)));
}

model{
  sigma ~ gamma(1, 1);
  length_scale ~ gamma(1, 1);
  sds ~ exponential(1);
  L_KW ~ inv_wishart_cholesky(N, L_K);
  y_obs ~ multi_normal_cholesky(rep_vector(0, N), L_KW);
}

generated quantities {
  vector[N] y_new = multi_normal_cholesky_rng(rep_vector(0, N), L_KW);

  matrix[N,N] cov_mat = multiply_lower_tri_self_transpose(L_KW);

}

I still need to go through the function onion_cholesky_lp and fully understand what it does. Would it be right to say, from a very high level point of view, that it is necessary to avoid the LDLT_Factor of random variable is not positive definite. error?