Gibbs post-processing to find unknown K in mixture model

Hey thanks for the reply. Averaging over the discrete parameters seems like the canonical advise. In the example above K \approx \sum_{h=1}^H 1_{n_h}>0 (i.e. the number of components with at least 1 allocated data point) is ultimately distributed according to Categorical(\lambda), so one should be able to average over the discrete components and then just use \lambda to work out a distribution for K.

Two problems are identifiability and priors. Betancourt’s excellent piece of identifiability was very useful. He says that one can use ordering constraints to achieve identifiability (which is partially contracted by BDA3 that says label switching issues can persist nonetheless). Let’s say for arguments sake this is solveable.

The other problem is priors, in your model above the prior on your simplex p is all important and will decide how the H components are weighted. BDA3 recommends using Dirichlet(\frac{1}{H}) to force concentration (allocation to fewer clusters). However, like any prior, as the amount of data increases, the data overwhelms the prior and we end up with allocations all along H because more components will definitely lead to a higher log posterior probability. Meanwhile, if the amount of data is small then “where does your prior come from?” seems like an absolutely valid question because the prior becomes instrumental to deciding how many clusters there will be and a principled choice is important.

I’ve not yet encountered a principled way to approach "mixture models with unknown K" generically. To me it seems that I’ll just have to work harder to add lots more structure.

1 Like