Covariance prior via lkj_corr_cholesky

Hi all, I thought I would dip my toes into Stan by impementing a Truncated Dirichlet Process Gaussian Mixture Model. As a part of this I naturally initially placed an Inverse-Wishart on the mixture covariances, but found that at higher dimensionality there were issues with the covariance matrices not being PD.

So, I did some reading in the Stan manual and modified my model code to instead derive the covariance matrices in terms of correlation matrices drawn from an LKJ and scales drawn from a Cauchy (then quad_form_diag). Still, issues with positive definiteness.

So, I did some more reading and instead of directly drawing from the LKJ, I am now using lkj_corr_cholesky to get the lower triangular of the correlation matrix, then computing the correlation matrix by multiply_lower_tri_self_transpose and the covariance matrix by quad_form_diag, as before. However, I am still getting the following for all chains.

SAMPLING FOR MODEL 'dpgmm_full_joint' NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: validate transformed params: Sigma[i_0__] is not positive definite.  (in 'model13db33ba60668_dpgmm_full_joint' at line 34)

Clearly, I am missing or have misunderstood something fundamental in my approach, so would appreciate experienced eyes doing some sanity checking.

The latest version of this that I have is below.

data {
  // Num samples, dimensionality & data matrix.
  int<lower=1> N;
  int<lower=1> D;
  vector[D] y[N];

  // Concentration hyperparam for the DP.
  real<lower=0> alpha;

  // DP truncation.
  int<lower=1> K;

parameters {
  // Stick-breaking proportions/weights.
  real<lower=0, upper=1> v[K];

  // Means and covariances of the mixture components.
  // Not directly sampling the covariance matrices due to
  // numerical conditioning at higher dimensions - inv wishart
  // not yielding PD draws. Alternatively, put priors on
  // correlation matrices Omega (via cholesky factors of an LKJ
  // prior draw) and scale vector sigma drawn from a cauchy prior.
  vector[D] mu[K];
  cholesky_factor_corr[D] L[K];
  vector<lower=0>[D] sigma[K];

transformed parameters {
  // Recover covariance matrices Sigma from our correlation
  // matrices Omega and scales sigma.
  cov_matrix[D] Sigma[K];
  corr_matrix[D] Omega[K];
  simplex[K] theta;
  for (k in 1 : K) {
    Omega[k] = multiply_lower_tri_self_transpose(L[k]);
    Sigma[k] = quad_form_diag(Omega[k], sigma[k]);

  // Lets break some sticks.
  theta[1] = v[1];

  for (k in 2 : K - 1) {
    real prod_term;
    prod_term = 1.0;
    for (j in 1:(k - 1)) {
      prod_term = prod_term * (1 - v[j]);
    theta[k] = v[k] * prod_term;

  theta[K] = 1.0 - sum(theta[1 : (K - 1)]);

model {
  // Priors
  for (k in 1 : K) {
    v[k] ~ beta(1.0, alpha);
    mu[k] ~ normal(0.0, 1.0);
    L[k] ~ lkj_corr_cholesky(2.0);
    sigma[k] ~ cauchy(0.0, 5.0);

  // Likelihood
  for (n in 1 : N) {
    vector[K] log_probs;
    for (k in 1 : K) {
      log_probs[k] = log(theta[k]) + multi_normal_lpdf(y[n] | mu[k], Sigma[k]);
    target += log_sum_exp(log_probs);

Many thanks

Welcome to the forums! The default construction of covariance matrices in Stan can lead with high probability to numerical issues at initialization when the matrices are large. Fortunately, there are good ways to construct/initialize these matrices that avoid problems. This post and the links therein are a good place to start:


Hi, many thanks for the response. There is a decent tree of reading materials from that thread, so I shall read through today.

More generally however, shouldn’t the matrix Omega = L*L' be positive definite, by virtue of being a quadratic form in L (assuming that L ~ LKJ_chol is well behaved)?

Ditto for the scales S = I*s^2, for s ~ Cauchy, in the covariance matrix Sigma = S * Omega * S.


The problem is likely that at initializtion L contains some elements NaN.

Yes, from the other threads I suspect this is the case (yet to verify).