Identifiability with Mixture Models

Jennifer · November 30, 2017, 9:52am

Hi,

I am trying to get some information about identifiability with Bayesian inference and especially with mixture models. I have found a paper on this topic (Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling) but I feel like I am not any smarter than before.

Jasra, Holmes and Stephens write:

“One of the main challenges of a Bayesian analysis
using mixtures is the nonidentifiability of the components.
That is, if exchangeable priors are placed upon
the parameters of a mixture model, then the resulting
posterior distribution will be invariant to permutations
in the labelling of the parameters. As a result,
the marginal posterior distributions for the parameters
will be identical for each mixture component. Therefore,
during MCMC simulation, the sampler encounters
the symmetries of the posterior distribution and the
interpretation of the labels switches. It is then meaningless
to draw inference directly from MCMC output
using ergodic averaging. Label switching significantly
increases the effort required to produce a satisfactory
Bayesian analysis of the data, but is a prerequisite of
convergence of an MCMC sampler and therefore must
be addressed.”

Is there any way to explain this in a more simple way? Any help would be appreciated!

betanalpha · November 30, 2017, 2:48pm

https://betanalpha.github.io/assets/case_studies/identifying_mixture_models.html

Note that even once the label switching has been removed, exchangeable mixture models still exhibit more subtle non-identifiabilties that make them very hard to fit.

Bob_Carpenter · December 21, 2017, 10:35pm

If you have a mixture of two components

p(y | mu, sigma, lambda)
  = lambda * normal(y | mu[1], sigma[1]) 
    + (1 - lambda) * normal(y | mu[2], sigma[2])

The model isn’t identifiable as written because the parameter values

theta1 = (mu[1], sigma[1], mu[2], sigma[2], lambda)
theta2 = (mu[2], sigma[2], mu[1], sigma[1], 1 - lambda)

produce exactly the same likelihood value. If the prior for (mu[1], sigma[1]) is the same as that for (mu[2], sigma[2]), then you still have non-identifiability. What we normally recommend is an asymmetric prior that orders mu[1] < mu[2] to identify the model. Now, only one of theta1 or theta2 above is possible. Michael Betancourt (aka @betanalpha)'s case study he links has much more detail.

Topic		Replies	Views
Why are Bayesian Neural Networks multi-modal? General	37	6389	June 27, 2018
Why would anyone ever want to use a likelihood for a mixture model in which the discrete variables are "not marginalized out" Modeling specification	11	889	May 27, 2019
Binomial Mixtures with a Mixture of Dirichlet Process Prior Modeling	2	508	April 7, 2022
Multimodality issues in regression model with mixture prior Modeling techniques , fitting-issues	4	931	August 29, 2019
Forcing separation in location parameters for Gaussian mixture models Modeling	8	740	September 25, 2019

Identifiability with Mixture Models

Related Topics