Mulitvariate Mixture (HMM) Parameter Ordering to Prevent Label Switching

Hello all,

First I would like to offer a sincere thank you to all the developers and contributors to Stan. This is an amazing platform and I am grateful for any assistance you can offer me here.

I am working with a collection of univariate and multivariate mixture models with a finite number of states. I am comfortable implementing a label switching solution of an ordered vector in the univariate case per Chapter 13 in the Stan Reference Manual and Michael Betancourt’s article on Identification issues.

Where I run into trouble though is transitioning to the Multivariate case. I implemented Dr. Lieu’s model but the solution seems contrary to my intuition.

As implemented below, it appears the ordering has been applied to the mu’s of the various dimensions for each state. But it may not be the case that the mu’s are ordered across dimensions. What I am hoping to attain is an ordering of the mu’s of a particular dimension across the span of the states, similar to the univariate case.

Am I missing a nuance that these two perspectives are exchangeable?

  int<lower=1> N;    //Number of Observations
  int<lower=1> M;    //Number of dimensions
  vector[M] y[N];      //N observations in M dimensions
  int<lower = 1> K;  //Number of states

  simplex[K] theta;        //mixing proportions

  ordered[M] mu[K];      //  <-- This seems counter to intuition. I would like the ordering to apply to the 
                                     //   different states of K, not the dimension (M) of the data.

  cholesky_factor_corr[M] L_Omega[K]; //Correlation table
  vector<lower=0>[M] tau[K];  //scale

transformed parameters{
  matrix[M, M] L_Sigma[K];              //lower triangle covariance matrix
  for (k in 1:K)
    L_Sigma[k] = diag_pre_multiply(tau[k], L_Omega[k]);  

  vector[K] log_theta = log(theta);       //cache log(theta) for use in the loop

  for(k in 1:K){
    mu[k] ~ normal(0,3);
    L_Omega[k] ~ lkj_corr_cholesky(2);
    tau[k] ~ lognormal(0,2);
  for (i in 1:N){
    vector[K] log_prob_sum = log_theta;  //start with the mix probability
    for(k in 1:K){
      log_prob_sum[k] += multi_normal_cholesky_lpdf(y[i] | mu[k], L_Sigma[k]); //add in the probability for each state
    target += log_sum_exp(log_prob_sum);   //increment target by the numerically stable sum of log odds

Thank you again!

1 Like

You just need to break the non-identifiability somehow, and in more than one dimensions there’s no unique way of doing that. Ordering any component of the mixture means is sufficient.

In any case pay careful attention to the fit validation as when there are more than two components there are a myriad of new pathologies that make exchangeable mixtures extraordinarily difficult to fit, even with an ordering to avoid the label-switching problem.