Does this work for dealing with non-identifiability due to permutation symmetry?

Kevin_Van_Horn · September 28, 2018, 5:41pm

Suppose that we have a gaussian mixture model with N > 1 components. The parameters to infer are

vector[N] mu;  // mean for each component
vector[N] sigma;  // sd for each component

but, as Michael Betancourt’s discusses (http://mc-stan.org/users/documentation/case-studies/identifying_mixture_models.html), there are identifiability issues: assuming symmetric priors, permuting the component indices leaves the joint probability density unchanged, guaranteeing us at least N! posterior modes.

One approach is to impose a constraint that mu[i] < mu[i+1] for all 1 <= i < N. One way of looking at it is to define the canonical form of (mu, sigma) to be the pair of vectors

canonical(mu, sigma) = (mu', sigma')

obtained by permuting indices such that the elements of mu’ are in increasing order. The constraint is then that (mu, sigma) must already be in canonical form.

But what if we simply impose canonical form after sampling is done, that is, take the sample and map (mu, sigma) to canonical(mu, sigma)? Effectively we are saying that (mu1, sigma1) and (mu2, sigma2) are the same point if they have the same canonical form. We’ve moved from a Euclidean space to a space that is locally Euclidean but not globally Euclidean. This could make things hairy if the proposal distribution defined by NUTS were asymmetric, but luckily it is symmetric, so it seems to me that the detailed balance equations should still hold.

Am I missing something here?

andrewgelman · September 28, 2018, 6:22pm

Kevin:

You can do this, but it can make the sampling much slower and convergence much more difficult if the different modes are separated in the posterior distribution.

Kevin_Van_Horn · September 28, 2018, 6:39pm

Not sure I follow you here. When you say “make the sampling much slower,” are you talking about slower iterations, or more iterations to get convergence?

If we consider ourselves to be working in a “canonical” parameter space that is a 1/N! slice of the original space as I proposed, it seems to me that you would want to canonicalize the draws before computing N_eff and R_hat. So you don’t have all those symmetric modes anymore, but you’ve avoided the problems that can occur when you approach the boundaries of the constrained space when explicitly imposing an ordering constraint.

In more detail: any path taken in the original space is mapped, via canonical(), to an equivalent path in the canonical space. Any symmetric proposal distribution in the original space is mapped to a symmetric proposal distribution in the canonical space. The probability density at any point in the original space is the same (up to a factor of N!) as the probability density in the canonical space.

betanalpha · October 1, 2018, 1:06am

This approach presumes that you have an exact symmetry in your posterior and technically works because it does the same thing as the constraint (identifying a unique orthant). That said, it will mess up adaptation and diagnostics which don’t know about this symmetry and hence end up in weird configurations. Hence it’s always better to remove it in the model specification itself.

That said, be careful because removing the label switching only peels away the first layer of pathologies in exchangeable mixture models. With more than two components there are myriad more subtle yet equally problematic non-identifiabilites and weak identifiabilities that nowhere near as easy to manage.

I do not recommend using exchangeable mixture models at all! Non-exchangeable mixture models are great, but once the components are all degenerate you’re in for trouble.

Topic		Replies	Views
Identifiability with Mixture Models General	2	1977	December 21, 2017
Identifiability of Gaussian mixture mode Modeling	24	2736	October 29, 2017
What does it mean to say that a model is "unidentifiable"? General fitting-issues	5	1714	June 22, 2020
Other constraints than mean ordering to identify mixture model? Modeling specification	7	778	September 17, 2020
Non-identifiability of mixing weights in a 2-component Gaussian model Modeling	5	537	July 20, 2018

Does this work for dealing with non-identifiability due to permutation symmetry?

Related topics