Identifiability with Mixture Models



I am trying to get some information about identifiability with Bayesian inference and especially with mixture models. I have found a paper on this topic (Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling) but I feel like I am not any smarter than before.

Jasra, Holmes and Stephens write:

“One of the main challenges of a Bayesian analysis
using mixtures is the nonidentifiability of the components.
That is, if exchangeable priors are placed upon
the parameters of a mixture model, then the resulting
posterior distribution will be invariant to permutations
in the labelling of the parameters. As a result,
the marginal posterior distributions for the parameters
will be identical for each mixture component. Therefore,
during MCMC simulation, the sampler encounters
the symmetries of the posterior distribution and the
interpretation of the labels switches. It is then meaningless
to draw inference directly from MCMC output
using ergodic averaging. Label switching significantly
increases the effort required to produce a satisfactory
Bayesian analysis of the data, but is a prerequisite of
convergence of an MCMC sampler and therefore must
be addressed.”

Is there any way to explain this in a more simple way? Any help would be appreciated!


Note that even once the label switching has been removed, exchangeable mixture models still exhibit more subtle non-identifiabilties that make them very hard to fit.


If you have a mixture of two components

p(y | mu, sigma, lambda)
  = lambda * normal(y | mu[1], sigma[1]) 
    + (1 - lambda) * normal(y | mu[2], sigma[2])

The model isn’t identifiable as written because the parameter values

theta1 = (mu[1], sigma[1], mu[2], sigma[2], lambda)
theta2 = (mu[2], sigma[2], mu[1], sigma[1], 1 - lambda)

produce exactly the same likelihood value. If the prior for (mu[1], sigma[1]) is the same as that for (mu[2], sigma[2]), then you still have non-identifiability. What we normally recommend is an asymmetric prior that orders mu[1] < mu[2] to identify the model. Now, only one of theta1 or theta2 above is possible. Michael Betancourt (aka @betanalpha)'s case study he links has much more detail.