Identifiability of Gaussian mixture mode

From what I understand earlier, you said that if my beta’s (the coefficients of the bspline) are not exchangeable and perhaps the mixture locations can be identified. if that is the case, then perhaps I should assign/try to find some non-exchangeable priors on beta’s to make mu’s more identifiable?

That would be the general strategy, but I think we can make an easier time of it by thinking generatively.

Start by considering a single time. What do the two components look like? Are there scales characterizing the particle sizes? Ideally you want to be able to identify ranges of means and variances for each component so that the two components have very little overlap. For example, you might have one component with means around 1 micrometer and one component with means around 1 mm with a prior on the variances that prevents the two component distributions from overlapping. Here the informative priors break the exchangeability and give you something that should fit well (and in similar applications does fit well!).

Once you have a reasonable model for one time point then you can consider the time dynamics. Do the component properties change with time or just their relatively weights? If the former what are reasonable scales for those changes and how can you encode them into, say, a spline or a time series model or a Gaussian process? If the latter then you want to keep the component properties constant but instead model time dynamics for logit_probability of the first component.

Hi Michael:

thanks for your instructive suggestion. yes, I guess for fitting such a complex model I really need to start from the simplest model and build all the way up to ensure that I understand the model behaviour at each stage.

Following your previous post, you mentioned something about the

In my previous model, just fitting a bspline to the dataset without imposing the 2-component part, to ensure the identifiability of the bspline coefficients, a sum-to-zero constraint is imposed to my beta’s. However, while fitting the current model (mixture + bspline), I skipped this constraint but re-introduced after you mentioned it.

It appears that re-introducing this constraint has improved my model in the sense that the model does not have any divergent transitions.

I just ran 4 chains each with 5000 iterations and warmup=1000 and does not get any divergent transitions.
and according to the posterior plots, locations mu’s and scales sigma’s only switched in chain 3 and the rest behaves quite nicely.

I also attached the marginal posterior density graphs for mixture component locations and scales, clearly they show bi-modal.

Given there is no divergent transition, does the model estimate my simulated data well? so that I can use this model structure to explore the real data? thanks in advance
Rplot03.pdf (7.1 KB)
Rplot04.pdf (7.4 KB)
Rplot02.pdf (7.1 KB)
Rplot05.pdf (7.7 KB)
Rplot.pdf (835.8 KB)
Rplot01.pdf (872.9 KB)

Yes, yes, yes! That’s by far the most efficient way to develop a complex model.

It depends on what you mean by “explore”. What your fit will not tell you is the relative importance of those two modes. So if you’re just trying to explore how the different possibilities behave without needed accurate quantifications of their relative importances (as is common with, for example, Latent Dirichlet Allocation) then you can go ahead and explore.

That said, the physical nature of your problem would make me a bit hesitant as the relative importance of the modes will be important to the physical interpretation. Personally I would try to build up something more generative and consistent with the underlying physical system the doesn’t exhibit multimodality.

This seems to be related to a density regression, which I recommend to check out, unless you are certain about the number of components and shape of the components forming the mixture.

1 Like