# Covariance prior via lkj_corr_cholesky

Hi all, I thought I would dip my toes into Stan by impementing a Truncated Dirichlet Process Gaussian Mixture Model. As a part of this I naturally initially placed an Inverse-Wishart on the mixture covariances, but found that at higher dimensionality there were issues with the covariance matrices not being PD.

So, I did some reading in the Stan manual and modified my model code to instead derive the covariance matrices in terms of correlation matrices drawn from an LKJ and scales drawn from a Cauchy (then `quad_form_diag`). Still, issues with positive definiteness.

So, I did some more reading and instead of directly drawing from the LKJ, I am now using `lkj_corr_cholesky` to get the lower triangular of the correlation matrix, then computing the correlation matrix by `multiply_lower_tri_self_transpose` and the covariance matrix by `quad_form_diag`, as before. However, I am still getting the following for all chains.

``````SAMPLING FOR MODEL 'dpgmm_full_joint' NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: validate transformed params: Sigma[i_0__] is not positive definite.  (in 'model13db33ba60668_dpgmm_full_joint' at line 34)
``````

Clearly, I am missing or have misunderstood something fundamental in my approach, so would appreciate experienced eyes doing some sanity checking.

``````data {
// Num samples, dimensionality & data matrix.
int<lower=1> N;
int<lower=1> D;
vector[D] y[N];

// Concentration hyperparam for the DP.
real<lower=0> alpha;

// DP truncation.
int<lower=1> K;
}

parameters {
// Stick-breaking proportions/weights.
real<lower=0, upper=1> v[K];

// Means and covariances of the mixture components.
// Not directly sampling the covariance matrices due to
// numerical conditioning at higher dimensions - inv wishart
// not yielding PD draws. Alternatively, put priors on
// correlation matrices Omega (via cholesky factors of an LKJ
// prior draw) and scale vector sigma drawn from a cauchy prior.
vector[D] mu[K];
cholesky_factor_corr[D] L[K];
vector<lower=0>[D] sigma[K];
}

transformed parameters {
// Recover covariance matrices Sigma from our correlation
// matrices Omega and scales sigma.
cov_matrix[D] Sigma[K];
corr_matrix[D] Omega[K];
simplex[K] theta;
for (k in 1 : K) {
Omega[k] = multiply_lower_tri_self_transpose(L[k]);
}

// Lets break some sticks.
theta[1] = v[1];

for (k in 2 : K - 1) {
real prod_term;
prod_term = 1.0;
for (j in 1:(k - 1)) {
prod_term = prod_term * (1 - v[j]);
}
theta[k] = v[k] * prod_term;
}

theta[K] = 1.0 - sum(theta[1 : (K - 1)]);
}

model {
// Priors
for (k in 1 : K) {
v[k] ~ beta(1.0, alpha);
mu[k] ~ normal(0.0, 1.0);
L[k] ~ lkj_corr_cholesky(2.0);
sigma[k] ~ cauchy(0.0, 5.0);
}

// Likelihood
for (n in 1 : N) {
vector[K] log_probs;
for (k in 1 : K) {
log_probs[k] = log(theta[k]) + multi_normal_lpdf(y[n] | mu[k], Sigma[k]);
}
target += log_sum_exp(log_probs);
}
}

``````

Many thanks

Welcome to the forums! The default construction of covariance matrices in Stan can lead with high probability to numerical issues at initialization when the matrices are large. Fortunately, there are good ways to construct/initialize these matrices that avoid problems. This post and the links therein are a good place to start:

2 Likes

Hi, many thanks for the response. There is a decent tree of reading materials from that thread, so I shall read through today.

More generally however, shouldnâ€™t the matrix `Omega = L*L'` be positive definite, by virtue of being a quadratic form in `L` (assuming that `L ~ LKJ_chol` is well behaved)?

Ditto for the scales `S = I*s^2`, for `s ~ Cauchy`, in the covariance matrix `Sigma = S * Omega * S`.

Best,
Jack

The problem is likely that at initializtion `L` contains some elements `NaN`.

Yes, from the other threads I suspect this is the case (yet to verify).