LDLT_Factor of random variable is not positive definite

marcopark90 · December 14, 2023, 8:03pm

Hi everyone,

I am trying to fit a Gaussian Process with a Covariance function specified by a prior Wishart distribution.
I am simulating some random data in R and then fit the model. I always get the error:
LDLT_Factor of random variable is not positive definite.
which is weird because the Covariance Matrix should be positive definite and symmetric by definition.

Here the full code:

library(rstan)
library(fields)

set.seed(999)

options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

N <- 100

x <- seq(0, 1, length.out = sqrt(N))

index <- structure(as.matrix(expand.grid(x, x)))

K <- Exponential(as.matrix(dist(index)), alpha = .5)

r_w <- rWishart(N, N, K)[,,1]
rmv <- drop(rmvnorm(n = 1, mu = 0, Sigma = r_w))


fit <-
  stan(
    file = "./example1.stan",
    data = list(N = N,
                index = index,
                y_obs = rmv),
    chains = 4,
    iter = 1000
  )

and the relative stan code:

data {
  int<lower=1> N;
  array[N] vector[2] index;
  vector[N] y_obs;
}


parameters{
  real<lower=0> sigma;
  real<lower=0> length_scale;
  cov_matrix[N] KW;
}

transformed parameters{
  cov_matrix[N] K = gp_exponential_cov(index, sigma, length_scale) + diag_matrix(rep_vector(1e-9, N));
}

model{
  sigma ~ gamma(1, 1);
  length_scale ~ gamma(1, 1);
  KW ~ wishart(N, K);
  
  y_obs ~ multi_normal(rep_vector(0, N), KW);
}

Is there anything I am doing wrong?

Thank you,
Marco

spinkney · December 14, 2023, 9:17pm

Unfortunately, large covariance matrices have initialization problems. I believe if you try this with N < 30 then it will work. Once you get over a dimension of ~30 then you will start to see this error.

You can use the Using the onion method in occupancy model hierarchical model - #2 by spinkney.

This will give you a cholesky factor that you can multiply by the transpose of itself to get a corr mat. Then multiply by the vector of variances to get a cov mat.

Or you can use the wishart_cholesky directly

functions {
matrix onion_cholesky_lp(
		       int N,
                       real eta,
		       vector r_2,
		       row_vector l_omega) {       
  {
    matrix[n_pred, n_pred] L_Omega = rep_matrix(0, N, N);
    real alpha = eta + (N - 2.0) / 2.0;
    vector[N - 1] shape1;
    vector[N - 1] shape2;
 
     shape1[1] = alpha;
     shape2[1] = alpha;

    int start = 1;
    int end = 2;

    L_Omega[1,1] = 1.0;
    L_Omega[2,1] = 2.0 * r_2[1] - 1.0;
    L_Omega[2,2] = sqrt(1.0 - square(L_Omega[2,1]));
    for (k in 2:N - 1) {
      int kp1 = k + 1;
      row_vector[k] l_row = segment(l_omega, start, k);
      real scale = sqrt(r_2[k] / dot_self(l_row));
      L_Omega[kp1, 1:k] = l_row[1:k] * scale;
      L_Omega[kp1, kp1] = sqrt(1.0 - r_2[k]);
      
     alpha -= 0.5;
    shape1[k] = k / 2.0;
    shape2[k] = alpha;

      start = end + 1;
      end = start + k - 1;
    }

    target += beta_lpdf(r_2, shape1, shape2);

    return L_Omega;
  }
}
}
data {
  int<lower=1> N;
  array[N] vector[2] index;
  vector[N] y_obs;
  real<lower=0> eta;
}


parameters{
  real<lower=0> sigma;
  real<lower=0> length_scale;
 row_vector[choose(N, 2) - 1]  l_omega_p;
vector<lower = 0,upper = 1>[N - 1] R2;
 // cov_matrix[N] KW;
}

transformed parameters{
  matrix[N, N] L_KW = onion_cholesky(N, eta, R2, l_omega_p);
  matrix[N, N] L_K = cholesky_decompose(gp_exponential_cov(index, sigma, length_scale) + diag_matrix(rep_vector(1e-9, N)));
}

model{
  sigma ~ gamma(1, 1);
  length_scale ~ gamma(1, 1);
  L_KW ~ wishart_cholesky(N, L_K);
  
  y_obs ~ multi_normal_cholesky(rep_vector(0, N), L_KW);
}

marcopark90 · December 15, 2023, 12:56pm

Thank you very much for your answer,
One last question, you wrote:

  L_KW ~ wishart_cholesky(N, L_K);
  
  y_obs ~ multi_normal_cholesky(rep_vector(0, N), L_KW);

should it be this instead?:

  L_KW ~ wishart_cholesky(N, L_K);
  
  y_obs ~ multi_normal_cholesky(rep_vector(0, N), cholesky_decompose(L_KW));

Thank you,
Marco

spinkney · December 15, 2023, 2:58pm

No, L_KW is already a cholesky_factor. A few additional things. I updated so that the method outputs a uniform cholesky factor of correlation matrices. You can then apply another prior on top. Then, to get a cholesky factor of a covariance matrix you need to multiply by a vector of standard deviations.
Lastly, the inverse wishart is what is the proper distribution for a covariance matrix. If you want to model the inverse correlation matrix, aka a precision matrix, then use the wishart.

functions {
matrix onion_cholesky_lp(
        int N,
        vector r_2,
        row_vector l_omega,
        data vector shape1,
        data vector shape2) {       
  {
    matrix[n_pred, n_pred] L_Omega = rep_matrix(0, N, N);

    int start = 1;
    int end = 2;

    L_Omega[1,1] = 1.0;
    L_Omega[2,1] = 2.0 * r_2[1] - 1.0;
    L_Omega[2,2] = sqrt(1.0 - square(L_Omega[2,1]));
    for (k in 2:N - 1) {
      int kp1 = k + 1;
      row_vector[k] l_row = segment(l_omega, start, k);
      real scale = sqrt(r_2[k] / dot_self(l_row));
      L_Omega[kp1, 1:k] = l_row[1:k] * scale;
      L_Omega[kp1, kp1] = sqrt(1.0 - r_2[k]);
      start = end + 1;
      end = start + k - 1;
    } 
    target += std_normal_lpdf(l_omega);
    target += beta_lpdf(r_2, shape1, shape2);

    return L_Omega;
  }
}
}
data {
  int<lower=1> N;
  array[N] vector[2] index;
  vector[N] y_obs;
}
transformed data {
// let's make L a uniform distribution over cholesky corr factors
// impliles eta = 1;
  real<lower=0> eta = 1;
  real<lower = 0> alpha = eta + (K - 2) / 2.0;
  vector<lower = 0>[K-1] shape1;
  vector<lower = 0>[K-1] shape2;

  shape1[1] = alpha;
  shape2[1] = alpha;
  for(k in 2:(K-1)) {
    alpha = alpha - 0.5;
    shape1[k] = k / 2.0;
    shape2[k] = alpha;
  }
}

parameters{
  real<lower=0> sigma;
  real<lower=0> length_scale;
 row_vector[choose(N, 2) - 1]  l_omega_p;
vector<lower = 0,upper = 1>[N - 1] R2;
 vector<lower=0>[N] sds;
 // cov_matrix[N] KW;
}

transformed parameters{
  matrix[N, N] L_KW = onion_cholesky_lp(N, R2, l_omega_p, shape1, shape2);
  matrix[N, N] L_K = cholesky_decompose(gp_exponential_cov(index, sigma, length_scale) + diag_matrix(rep_vector(1e-9, N)));
}

model{
  sigma ~ gamma(1, 1);
  length_scale ~ gamma(1, 1);
  sds ~ exponential(1);
  L_KW ~ inv_wishart_cholesky(N, diag_pre_multiply(sds, L_K));
  y_obs ~ multi_normal_cholesky(rep_vector(0, N), L_KW);
}

marcopark90 · December 15, 2023, 4:56pm

Thank you so much for this. A few last tweaks.

In the function onion_cholesky_lp in your code there is the following:

...
matrix[n_pred, n_pred] L_Omega = rep_matrix(0, N, N);
...
beta_lpdf(r_2, shape1, shape2);

should they be like this instead?:

...
matrix[N, N] L_Omega = rep_matrix(0, N, N);
...
beta_lpdf(r_2 | shape1, shape2);

also, when you write:

  real<lower = 0> alpha = eta + (K - 2) / 2.0;
  vector<lower = 0>[K-1] shape1;
  vector<lower = 0>[K-1] shape2;

what is K? It is not defined anywhere else.

Thank you

spinkney · December 15, 2023, 4:57pm

Yes.

K should be N

marcopark90 · December 15, 2023, 5:24pm

Perfect. Thank you so much, everything seems to work now.
Just to clarify, the parameter L_KW which is distributed as a inv_wishart_cholesky represents the cholesky decomposition of the covariance matrix.
If, from the posterior distribution, I need the actual covariance matrix I can simply do crossprod(L_KW). Am I right?

spinkney · December 15, 2023, 5:30pm

Sorry, no, it should be

transformed parameters{
  matrix[N, N] L_KW = diag_pre_multiply(sds, onion_cholesky_lp(N, R2, l_omega_p, shape1, shape2));
  matrix[N, N] L_K = cholesky_decompose(gp_exponential_cov(index, sigma, length_scale) + diag_matrix(rep_vector(1e-9, N)));
model {
...
L_KW ~ inv_wishart_cholesky(N, L_K);
...
}
}

Now L_KW represents the cholesky decomposition of the covariance matrix. To get the covariance matrix it’s a bit faster to do multiply_lower_tri_self_transpose(L_KW)

Bob_Carpenter · December 15, 2023, 5:40pm

We have a triangular form specialization,

matrix multiply_lower_tri_self_transpose(matrix x)

see: 6.5 Dot products and specialized products | Stan Functions Reference,

Stan doesn’t exploit conjugacy (Wishart/precision, inverse Wishart/covariance), so (inverse) Wisharts are not the only priors available for positive definite matrices. In our User’s Guide chapter on regression, we introduce the LKJ prior on correlation matrices which can shrink toward unit correlation matrix and then you can put an independent prior on scale.

marcopark90 · December 15, 2023, 6:22pm

Thank you all for helping me on this.
This is how the final code looks like

functions {
matrix onion_cholesky_lp(
        int N,
        vector r_2,
        row_vector l_omega,
        data vector shape1,
        data vector shape2) {       
  {
    matrix[N, N] L_Omega = rep_matrix(0, N, N);

    int start = 1;
    int end = 2;

    L_Omega[1,1] = 1.0;
    L_Omega[2,1] = 2.0 * r_2[1] - 1.0;
    L_Omega[2,2] = sqrt(1.0 - square(L_Omega[2,1]));
    for (k in 2:N - 1) {
      int kp1 = k + 1;
      row_vector[k] l_row = segment(l_omega, start, k);
      real scale = sqrt(r_2[k] / dot_self(l_row));
      L_Omega[kp1, 1:k] = l_row[1:k] * scale;
      L_Omega[kp1, kp1] = sqrt(1.0 - r_2[k]);
      start = end + 1;
      end = start + k - 1;
    } 
    target += std_normal_lpdf(l_omega);
    target += beta_lpdf(r_2 | shape1, shape2);

    return L_Omega;
  }
}
}

data {
  int<lower=1> N;
  array[N] vector[2] index;
  vector[N] y_obs;
}

transformed data {
// let's make L a uniform distribution over cholesky corr factors
// impliles eta = 1;
  real<lower=0> eta = 1;
  real<lower = 0> alpha = eta + (N - 2) / 2.0;
  vector<lower = 0>[N-1] shape1;
  vector<lower = 0>[N-1] shape2;

  shape1[1] = alpha;
  shape2[1] = alpha;
  for(k in 2:(N-1)) {
    alpha = alpha - 0.5;
    shape1[k] = k / 2.0;
    shape2[k] = alpha;
  }
}

parameters{
  real<lower=0> sigma;
  real<lower=0> length_scale;
  row_vector[choose(N, 2) - 1]  l_omega_p;
  vector<lower = 0,upper = 1>[N - 1] R2;
  vector<lower=0>[N] sds;
 // cov_matrix[N] KW;
}

transformed parameters{
  matrix[N, N] L_KW = diag_pre_multiply(sds, onion_cholesky_lp(N, R2, l_omega_p, shape1, shape2));
  matrix[N, N] L_K = cholesky_decompose(gp_exponential_cov(index, sigma, length_scale) + diag_matrix(rep_vector(1e-9, N)));
}

model{
  sigma ~ gamma(1, 1);
  length_scale ~ gamma(1, 1);
  sds ~ exponential(1);
  L_KW ~ inv_wishart_cholesky(N, L_K);
  y_obs ~ multi_normal_cholesky(rep_vector(0, N), L_KW);
}

generated quantities {
  vector[N] y_new = multi_normal_cholesky_rng(rep_vector(0, N), L_KW);

  matrix[N,N] cov_mat = multiply_lower_tri_self_transpose(L_KW);

}

I still need to go through the function onion_cholesky_lp and fully understand what it does. Would it be right to say, from a very high level point of view, that it is necessary to avoid the LDLT_Factor of random variable is not positive definite. error?

Topic		Replies	Views
Trouble with prior selection for multivariate normal-inverse wishart analysis Modeling	11	3254	October 22, 2018
Covariance prior via lkj_corr_cholesky Modeling fitting-issues , covariance	4	544	December 21, 2023
Gaussian LDA Modeling fitting-issues	18	3243	August 30, 2017
Prior for a covariance matrix Modeling techniques , fitting-issues , specification , math	9	2090	July 5, 2020
Prior for Variance Covariance Matrix and Error obtained in RStan regarding that Modeling rstan	1	1014	April 27, 2021

LDLT_Factor of random variable is not positive definite

Related topics