Mixture model with unknown k


Hello everyone!

I have recently read about mixture models, especially the cases in which we consider the number of components, k, as a random variables with posterior distribution.

I have read in different articles that the sampling of such posterior is pretty complicated, partly because the posterior’s dimensionality is changing with k.

Some authors proposed other sampling stategies, like reversible jumps Monte-Carlo, or the Fu and Wang’s algorithm, exploring the distribution of parameters conditional on K (2002,2007). Unfortunatly, I do not understand enough the different algorithms to recognize their similarities or dissimilarities with the ones implemented with STAN.

So I will ask the question very directly : is stan able to deal with those kind of problems, i.e. exploring a posterior distribution of changing dimension, given a random integer variable k?

Thank you very much!
Best regards,


The simple answer is “no”. Stan cannot have a discrete unknown parameter.


Thank you!

And is it possible to make k depend upon a probability mass function, such as poisson distribution? Something like making mu the random variable and k ~ poisson(mu)?


No. That would be discrete. If you have only a finite number of possible k, you can sum it out analytically (see the manual), but this isn’t the case you are talking about (because then it’s a mixture with fixed k, perhaps with some of the components empty).


You can just pick a relatively large number of components and sum out the discrete variable analytically. Then you just need to deal with there being too many components by shrinking their contribution towards zero or allowing them to overlap… but this is more of a “can you” answer rather than a “should you” answer b/c I haven’t done this myself so I don’t know how well/badly it works.


This strategy is employed when implementing Latent Dirichlet Allocation and ecology “super population” models in Stan. The ultimate success of the strategy depends on the particulars of the model – Latent Dirichlet Allocation is so poorly identified that marginalizing a finite number of modes leads to a mess whereas the ecology models fit really well.


Actually, my interest is in the partition of functional traits space
sampled in plants communities. I’m especially interested by testing niche
theory through different constraints on the variance of components.

I know the number of species in each community, so I could possibly fix a
maximum number of components. But I don’t quite well understand how the
model could fit “empty” or null components, instead of dividing the space
with the smallest parts possible (a behaviour I observed with the EM


These methods don’t ever assign components to included or not. Instead they marginalize over inclusion by giving each component a probability of explaining each observation. After the fact you can generate consistent assignments by sampling from those probabilities, but the probabilities themselves will always contain more information.

The key question is whether or not the components behave sufficiently differently that the posterior for those probabilities will be unimodal and possible to fit.