Identification of mixture of multivariate normal distributions

Hi Guido,

Thanks for your post! I have a similar use case, addressing label switching for the ability and discrimination parameters in multidimensional IRT models, and am working through what to do.

If you haven’t seen Jasra, Holmes and Stephens (2005), I’d definitely give it a read, particularly page 60. They note that in the case of a random beta mixture model (defined in the paper), imposing identifiability constraints on the mu parameters (i.e. ordering) produces different estimated mean values for those parameters than using Stephens’ algorithm for relabeling. That kind of echoes a cautionary point by @betanalpha here and Michael Betancourt here. Problems appear to arise when the posterior distributions of the K mu parameters overlap to any considerable extent, and whether they do is an empirical question.

You said:

I’m not sure I understand, isn’t that what you want? The reason I’m confused is that Stephen’s algorithm is a relabeling algorithm. So let’s say you have two parameters, alpha and beta, with alpha coming from distribution A and beta coming from distribution B, but relabeling alpha as beta and beta as alpha doesn’t change the joint posterior probability of these two parameters, given the data. Then let’s say you run two chains and draw just three samples each, such that your parameter estimates look like a list {aaabbb} for alpha and {bbbaaa} for beta (where “a” represents a sample from distribution A and so on). Running Stephen’s algorithm would return {aaaaaa} for alpha and {bbbbbb} for beta (or vice versa). You’re saying that you’re satisfied that the output from Stephen’s algorithm is two unimodal distributions, but dissatisfied that the input to Stephen’s algorithm was multimodal, right? But that’s fine, multimodality is inherent in the structure of your problem, and I think you just want to assign one consistent set of modes to parameters across chains while ensuring that you’re still exploring the entire parameter space. That’s exactly what relabeling algorithms are designed to do. Am I misunderstanding your question?

Regardless, it would help to see screenshots and your R code that transforms data in a stanfit object into input to the label.switching package, calls the “STEPHENS” method, and then converts the output into a new stanfit object or otherwise examines the posterior distributions of the parameters in question. That helps us to understand what’s going wrong for you. Also selfishly for me, I’m about to write exactly the post-processing R code that you just wrote – and I bet I’m not the only person out there with the use case of needing to post-process stanfit objects with relabeling algorithms.

As a side note, I’m finding that working with simulated data is helping me to debug because I can distinguish label switching from other forms of multimodality (between chains) such as rotation invariance.

Hope that helps, and thank you in advance as well!

Cheers,

Richard