Best practice for selecting number of mixture model components


I have a number of datasets with values constrained to lie between 0 and 1. I’ve set up a mixture model of two beta distributions which fits well in Stan. However I’d like to quantitatively decide in each case whether to use one, two or three mixture components- there are physical reasons why we might expect to see two beta distributions in some data sets but only one in others. What’s the best practice for choosing the number of mixture components to fit?

I’ve thought about fitting one, two and three components in every case and then choosing the model with the best PSIS-LOO score (for example). However I’ve also seen the idea here of fitting a model with a large number of mixture components and a regularising prior to force a concentration to fewer components. Is there a reason to prefer one technique over the other? Or is there another way to select the “best” number of mixture components?

Thanks very much!

Hey @samvaughan welcome to the community. I’d suggest a PSIS / Loo based model comparison approach might be appropriate in your case. This thread I think closesly follows your own requirements and should be instructive.

Model averaging, mixture models and model selection are three different methods but all seem all seem to overlap somehow in cases like yours. However, they do not lead to the same implications. @avehtari has a paper on this very theme and may be able to add more colour with regards to why.

In my own experience, i’ve found regularised priors to be a very blunt instrument for working with mixtures which are not already very well separated.

1 Like

Thanks @emiruz for your reply! With regards to the paper by @avehtari, do you mean this one from 2016? I’ve also found some of his notes here which look incredibly useful.

From memory the paper was some sort of comparison of ways to stack or average models.

1 Like