Identification of mixture of multivariate normal distributions

fitting-issues

#1

Hello,

I am trying to implement a latent profile analysis, i.e. a mixture of multivariate normal distributions.
The first step I took to promote identification is to define the mean of the first variable in the multivariate normal distributions as an ordered vector.

…
parameters {
  ordered[K] mu1;
  vector[D-1] mu_r[K];
 …
}
transformed parameters {
  vector[D] mu[K];
  for (k in 1:K) {
    mu[k][1] = mu1[k];
    for (d in 2:D) mu[k][d] = mu_r[k][d-1];
  }
…
}`

(full model below)

As expected, this is not sufficient to fully identify the model. So I tried deal with the label switching problem by using Stephen’s algorithm (in the R package label.switching). This gives not great but, from what I can tell, OK results. That is, the means of the multivariate normal distributions are no longer multi modal. Still, one can still see that the posterior distributions of the means are constructed from posterior samples that suffered from label switching.
Now I am wondering if the following approach would be OK:

  1. Run the Stan model that partially insures identification through ordering of the first variables of the multivariate normal distributions.
  2. Use Stephen’s label switching algorithm to remove most of the label switching and learn the order of the means other variables
  3. Run an updated Stan model that implements (some of the) ordering constraints learned in step 2.

Do you think this is a sound approach?
I would also welcome any other idea to identify the model.

Thanks in advance!
Guido

Full model
data {
 int N;             
 int D;           # number of dimensions for mvn
 int K;           # number of mixture components
 int sigmas;      # number of variance vectors for mvn (1 or K)
 matrix[N,D] y;   # data
 real v;          # prior for lkj_corr_cholesky
}

transformed data {
  int tau_idx[K];
  int lambda_n;
  for (k in 1:K) tau_idx[k] = K == sigmas ? k : 1;
  lambda_n = K == 1 ? 1 : N;
}

parameters {
  ordered[K] mu1;
  vector[D-1] mu_r[K];
  vector<lower = 0>[D] sigma[sigmas];  
  cholesky_factor_corr[D] L_Omega;
  simplex[K] lambda[lambda_n];
}

transformed parameters {
  vector[D] mu[K];
  cov_matrix[D] Sigma[sigmas];  

  
  for (k in 1:K) {
    mu[k][1] = mu1[k];
    for (d in 2:D) mu[k][d] = mu_r[k][d-1];
  } 
  for (g in 1:sigmas) Sigma[g] = quad_form_diag(multiply_lower_tri_self_transpose(L_Omega), sigma[g]);
}

model {
  L_Omega ~ lkj_corr_cholesky(v);
  
  mu1 ~ normal(0,3);
  for (k in 1:sigmas) {
    sigma[k] ~ normal(0,1);
    }
  for (k in 1:K) mu_r[k] ~ normal(0,3);
  

  for (n in 1:N) {
    vector[K] lps;
    for (k in 1:K) lps[k] = log(lambda[n][k]) + 
                                     multi_normal_lpdf(y[n,] | mu[k], Sigma[tau_idx[k]]);
    target += log_sum_exp(lps);
  }  
   
}