Hi all, I thought I would dip my toes into Stan by impementing a Truncated Dirichlet Process Gaussian Mixture Model. As a part of this I naturally initially placed an Inverse-Wishart on the mixture covariances, but found that at higher dimensionality there were issues with the covariance matrices not being PD.
So, I did some reading in the Stan manual and modified my model code to instead derive the covariance matrices in terms of correlation matrices drawn from an LKJ and scales drawn from a Cauchy (then quad_form_diag
). Still, issues with positive definiteness.
So, I did some more reading and instead of directly drawing from the LKJ, I am now using lkj_corr_cholesky
to get the lower triangular of the correlation matrix, then computing the correlation matrix by multiply_lower_tri_self_transpose
and the covariance matrix by quad_form_diag
, as before. However, I am still getting the following for all chains.
SAMPLING FOR MODEL 'dpgmm_full_joint' NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: validate transformed params: Sigma[i_0__] is not positive definite. (in 'model13db33ba60668_dpgmm_full_joint' at line 34)
Clearly, I am missing or have misunderstood something fundamental in my approach, so would appreciate experienced eyes doing some sanity checking.
The latest version of this that I have is below.
data {
// Num samples, dimensionality & data matrix.
int<lower=1> N;
int<lower=1> D;
vector[D] y[N];
// Concentration hyperparam for the DP.
real<lower=0> alpha;
// DP truncation.
int<lower=1> K;
}
parameters {
// Stick-breaking proportions/weights.
real<lower=0, upper=1> v[K];
// Means and covariances of the mixture components.
// Not directly sampling the covariance matrices due to
// numerical conditioning at higher dimensions - inv wishart
// not yielding PD draws. Alternatively, put priors on
// correlation matrices Omega (via cholesky factors of an LKJ
// prior draw) and scale vector sigma drawn from a cauchy prior.
vector[D] mu[K];
cholesky_factor_corr[D] L[K];
vector<lower=0>[D] sigma[K];
}
transformed parameters {
// Recover covariance matrices Sigma from our correlation
// matrices Omega and scales sigma.
cov_matrix[D] Sigma[K];
corr_matrix[D] Omega[K];
simplex[K] theta;
for (k in 1 : K) {
Omega[k] = multiply_lower_tri_self_transpose(L[k]);
Sigma[k] = quad_form_diag(Omega[k], sigma[k]);
}
// Lets break some sticks.
theta[1] = v[1];
for (k in 2 : K - 1) {
real prod_term;
prod_term = 1.0;
for (j in 1:(k - 1)) {
prod_term = prod_term * (1 - v[j]);
}
theta[k] = v[k] * prod_term;
}
theta[K] = 1.0 - sum(theta[1 : (K - 1)]);
}
model {
// Priors
for (k in 1 : K) {
v[k] ~ beta(1.0, alpha);
mu[k] ~ normal(0.0, 1.0);
L[k] ~ lkj_corr_cholesky(2.0);
sigma[k] ~ cauchy(0.0, 5.0);
}
// Likelihood
for (n in 1 : N) {
vector[K] log_probs;
for (k in 1 : K) {
log_probs[k] = log(theta[k]) + multi_normal_lpdf(y[n] | mu[k], Sigma[k]);
}
target += log_sum_exp(log_probs);
}
}
Many thanks