Lkj_corr_lpdf: Correlation matrix is not positive definite

ramiro · July 6, 2023, 7:59pm

I am working with a multi normal distribution, my model of interest is a true multi-normal distribution but I am starting with something simple and build from it. The model is the following:

y \sim \mathcal{N}(0,\Sigma)

where the covariance matrix \Sigma is just a diagonal matrix

\Sigma \sim \begin{pmatrix} \sigma^2 & 0 \\ \ldots \\ 0 & \sigma^2 \end{pmatrix}

and

\sigma \sim \mathcal{G}(11,0.3)

I am trying to infer \sigma.

This can work without the multi-normal of course, but I am trying to get it to work with this to learn from it and slowly add more complexity. If I introduce the lkj_corr prior I start getting errors. Below is the stan code:

data {
  int<lower=0> N;
  vector[N] y;
}

parameters {
  real<lower=0> sigmaD;
  corr_matrix[N] Omega;
}

transformed parameters {
  vector<lower=0>[N] sigma;
  for (i in 1:N) {
    sigma[i] = sigmaD^2;
  }
}

model {
  sigmaD ~ gamma(11,0.3);
  Omega ~ lkj_corr(3);
  y ~ multi_normal(rep_vector(0,N), quad_form_diag(Omega, sigma));
}

But I get errors such as these:

“Rejecting initial value:
Error evaluating the log probability at the initial value.
Exception: lkj_corr_lpdf: Correlation matrix is not positive definite.
Initialization between (-2, 2) failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
Initialization failed.”

I don’t quite understand why this is happening. Can someone explain? Any suggestions?

maxbiostat · July 6, 2023, 10:11pm

@spinkney any insights?

billWalker · July 7, 2023, 7:33am

I think the guidance is to always use the cholesky factorized version, lkj_corr_cholesky, for better stability.

See https://mc-stan.org/docs/stan-users-guide/multivariate-hierarchical-priors.html#optimization-through-cholesky-factorization in the guide.

andre.pfeuffer · July 7, 2023, 7:46am

@billWalker all correct! Agreed. one should better work on the cholesky form.

The problem arise with increasing likelihood if the dimension of the correlation matrix
increases. In the code above the dimension D = N, assigning each datapoint it’s own
variance which is cannot be estimated.

Even if D < N and D is large (>30) the sampler may run into problems. I figured two
ways around it:

One may set in the sampler statement init = 0, which is a quick hack, but not
recommended. Another is to do your own initialization. See Stan manuals for details.
Set the initialization of the correlation matrix or better the cholesky form to
to Identity matrix same dimension, eg.

initf2 <- function(chain_id = 1) {
# cat("chain_id =", chain_id, "\n")
list(
# ...
   Omega = diag(D)
)

ramiro · July 7, 2023, 2:33pm

Thank you very much Bill, however, I don’t think the cholesky will save this, it seems from @andre.pfeuffer’s response that it has more to do with the size of the correlation matrix.

ramiro · July 7, 2023, 2:50pm

Thank you very much!! I see that if the matrix is too big this multivariate normal or cholesky will have problems, I might be able to find a way around this for my problem though!.

spinkney · July 7, 2023, 2:56pm

You can try using the onion method to construct the matrix. This works for sizes much larger than 30.

functions {
  array[] vector create_shapes(int K, real eta) {
    real alpha = eta + (K - 2) / 2.0;
    array[2] vector[K - 1] shape;
    
    shape[1, 1] = alpha;
    shape[2, 1] = alpha;
    for (k in 2 : (K - 1)) {
      alpha -= 0.5;
      shape[1, k] = k / 2.0;
      shape[2, k] = alpha;
    }
    
    return shape;
  }
  
  matrix lkj_onion(int K, row_vector l, vector R2, data array[] vector shape) {
    matrix[K, K] L = rep_matrix(0, K, K); // cholesky_factor corr matrix
    {
      int start = 1;
      int end = 2;
      
      L[1, 1] = 1.0;
      L[2, 1] = 2.0 * R2[1] - 1.0;
      L[2, 2] = sqrt(1.0 - square(L[2, 1]));
      for (k in 2 : (K - 1)) {
        int kp1 = k + 1;
        row_vector[k] l_row = segment(l, start, k);
        real scale = sqrt(R2[k] / dot_self(l_row));
        L[kp1, 1 : k] = l_row[1 : k] * scale;
        L[kp1, kp1] = sqrt(1.0 - R2[k]);
        start = end + 1;
        end = start + k - 1;
      }
    }
    return L;
  }
}
data {
  int<lower=0> K; // dim of corr mat
  real<lower=0> eta; // lkj param
}
transformed data {
  array[2] vector[K - 1] shapes = create_shapes(K, eta);
}
parameters {
  row_vector[choose(K, 2) - 1] l; // do NOT init with 0 for all elements
  vector<lower=0, upper=1>[K - 1] R2; // first element is not really a R^2 but is on (0,1)  
}
transformed parameters {
  cholesky_factor_corr[K] L = lkj_onion(K, l, R2, shapes);
}
model {
  target += std_normal_lpdf(l);
  target += beta_lpdf(R2 | shapes[1], shapes[2]);
}
generated quantities {
  matrix[K, K] Sigma = multiply_lower_tri_self_transpose(L);
}

jmh530 · July 11, 2023, 3:18pm

That looks nice. What’s the intuition of the R2 parameter? Is that controlling the average level of correlation or partial correlations? I also don’t have a good sense of what the create_shapes function is doing.

For the OP, I’ve run into problems with the LKJ prior over the years. My problems were associated with problems where the variables are multivariate normal but with high correlations (like in a factor model). So the correlation parameters tend to be correlated in weird non-linear ways that Stan has difficulty with.

defjaf · July 11, 2023, 10:41pm

I am possibly missing something crucial , but I think you are vastly overcomplicating the problem. If you really know your correlation matrix to be diagonal, with the same variance along that diagonal, it’s just the product of identical univariate Gaussians from which you’re just trying to estimate the variance. Isn’t this just a textbook problem?

You don’t need to work with the Cholesky factorisation: you can do it by hand, and it’s just the matrix with sigma along the diagonal. But indeed you can write the whole problem as a simple sampler for sigma.

More likely I am missing a subtlety.

ramiro · July 12, 2023, 1:40pm

Thank you for your answer!! As mentioned:

My “real problem” is much more complicated, and the only reason that I am doing it like this is just so that I can get it to work in a simple case and then build from it. What I am aiming for is not a diagonal matrix, but rather a hidden markov model where there a multi-normal distribution with covariation between consecutive observations, but we have to start somewhere!!

Thank you again very much for your wanting to help!

Topic		Replies	Views
Covariance prior via lkj_corr_cholesky Modeling fitting-issues , covariance	4	548	December 21, 2023
Underestimating correlation coefficients with LKJ prior Modeling	24	6662	September 15, 2020
Learning a correlation matrix Modeling	8	3296	December 13, 2017
Initialization between (-2, 2) failed after 100 attempts Modeling specification	4	411	May 22, 2023
Bad LKJ Initialization in model from "Fast hierarchical Gaussian processes" Modeling specification	2	1471	March 22, 2018

Lkj_corr_lpdf: Correlation matrix is not positive definite

Related topics