Covariance prior via lkj_corr_cholesky

JackM · December 20, 2023, 7:55pm

Hi all, I thought I would dip my toes into Stan by impementing a Truncated Dirichlet Process Gaussian Mixture Model. As a part of this I naturally initially placed an Inverse-Wishart on the mixture covariances, but found that at higher dimensionality there were issues with the covariance matrices not being PD.

So, I did some reading in the Stan manual and modified my model code to instead derive the covariance matrices in terms of correlation matrices drawn from an LKJ and scales drawn from a Cauchy (then quad_form_diag). Still, issues with positive definiteness.

So, I did some more reading and instead of directly drawing from the LKJ, I am now using lkj_corr_cholesky to get the lower triangular of the correlation matrix, then computing the correlation matrix by multiply_lower_tri_self_transpose and the covariance matrix by quad_form_diag, as before. However, I am still getting the following for all chains.

SAMPLING FOR MODEL 'dpgmm_full_joint' NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: validate transformed params: Sigma[i_0__] is not positive definite.  (in 'model13db33ba60668_dpgmm_full_joint' at line 34)

Clearly, I am missing or have misunderstood something fundamental in my approach, so would appreciate experienced eyes doing some sanity checking.

The latest version of this that I have is below.

data {
  // Num samples, dimensionality & data matrix.
  int<lower=1> N;
  int<lower=1> D;
  vector[D] y[N];

  // Concentration hyperparam for the DP.
  real<lower=0> alpha;

  // DP truncation.
  int<lower=1> K;
}

parameters {
  // Stick-breaking proportions/weights.
  real<lower=0, upper=1> v[K];

  // Means and covariances of the mixture components.
  // Not directly sampling the covariance matrices due to
  // numerical conditioning at higher dimensions - inv wishart
  // not yielding PD draws. Alternatively, put priors on
  // correlation matrices Omega (via cholesky factors of an LKJ
  // prior draw) and scale vector sigma drawn from a cauchy prior.
  vector[D] mu[K];
  cholesky_factor_corr[D] L[K];
  vector<lower=0>[D] sigma[K];
}

transformed parameters {
  // Recover covariance matrices Sigma from our correlation
  // matrices Omega and scales sigma.
  cov_matrix[D] Sigma[K];
  corr_matrix[D] Omega[K];
  simplex[K] theta;
  for (k in 1 : K) {
    Omega[k] = multiply_lower_tri_self_transpose(L[k]);
    Sigma[k] = quad_form_diag(Omega[k], sigma[k]);
  }


  // Lets break some sticks.
  theta[1] = v[1];

  for (k in 2 : K - 1) {
    real prod_term;
    prod_term = 1.0;
    for (j in 1:(k - 1)) {
      prod_term = prod_term * (1 - v[j]);
    }
    theta[k] = v[k] * prod_term;
  }

  theta[K] = 1.0 - sum(theta[1 : (K - 1)]);
}

model {
  // Priors
  for (k in 1 : K) {
    v[k] ~ beta(1.0, alpha);
    mu[k] ~ normal(0.0, 1.0);
    L[k] ~ lkj_corr_cholesky(2.0);
    sigma[k] ~ cauchy(0.0, 5.0);
  }

  // Likelihood
  for (n in 1 : N) {
    vector[K] log_probs;
    for (k in 1 : K) {
      log_probs[k] = log(theta[k]) + multi_normal_lpdf(y[n] | mu[k], Sigma[k]);
    }
    target += log_sum_exp(log_probs);
  }
}

Many thanks

jsocolar · December 21, 2023, 4:41am

Welcome to the forums! The default construction of covariance matrices in Stan can lead with high probability to numerical issues at initialization when the matrices are large. Fortunately, there are good ways to construct/initialize these matrices that avoid problems. This post and the links therein are a good place to start:

JackM · December 21, 2023, 11:44am

Hi, many thanks for the response. There is a decent tree of reading materials from that thread, so I shall read through today.

More generally however, shouldn’t the matrix Omega = L*L' be positive definite, by virtue of being a quadratic form in L (assuming that L ~ LKJ_chol is well behaved)?

Ditto for the scales S = I*s^2, for s ~ Cauchy, in the covariance matrix Sigma = S * Omega * S.

Best,
Jack

jsocolar · December 21, 2023, 1:00pm

The problem is likely that at initializtion L contains some elements NaN.

JackM · December 21, 2023, 2:05pm

Yes, from the other threads I suspect this is the case (yet to verify).

Topic		Replies	Views
Prior for a covariance matrix Modeling techniques , fitting-issues , specification , math	9	2044	July 5, 2020
Lkj_corr_lpdf: Correlation matrix is not positive definite General	9	925	July 12, 2023
Bad LKJ Initialization in model from "Fast hierarchical Gaussian processes" Modeling specification	2	1462	March 22, 2018
Dealing with constraints on correlations Modeling	2	844	November 2, 2019
Covariance parametrization question Modeling	3	866	February 12, 2019

Covariance prior via lkj_corr_cholesky

Related topics