What does it mean to say that a model is "unidentifiable"?

Given a fitted bimodal Gaussian mixture model, how can I verify from the model parameters that a model is identifiable or not?

It essentially means that the model parameters remain ambiguous no matter how much data you have. If you try to fit a Gaussian Process to pure noise for instance, the noise level and the length scale are not jointly identifiable because the data can be explained just as well as pure noise (the “correct” inference) or by a very rapidly varying function with no noise.

In your case, whether or not it’s been fittted already and what the estimated parameter values, isn’t relevant since identifiability is a property of the model itself, not any particular instance of it. GMMs in general are not identifiable because there is no difference between \theta_1 \mathcal{N}(0, 1 / \theta_1^2) + (1 - \theta_1)\mathcal{N}(0, 1/(1 - \theta_1)^2) and \mathcal{N}(0, 1) where the latter is just a degenerate mixture model with \theta_1 = 1.

There is a symmetry which makes it impossible to distinguish between various different parameter settings.

2 Likes

Hello! This topic comes up quite regularly. The following posts and links seem relevant:

and the ever relevant:

4 Likes

I don’t think this is accurate . A two mixture Gaussian is made identifiable by a simple ordering constraint on the locations.

2 Likes

Typically you just fit your model with at least 4 chains and you’ll see the Rhats of your parameters converge to something close to 1 if your model is identified. It’s unlikely to happen by accident.

@hhau Thank you for these.
I went through ‘https://betanalpha.github.io/assets/case_studies/identifying_mixture_models.html#42_breaking_the_labeling_degeneracy_with_non-exchangeable_priors

In this model, how could I draw posterior samples ?