Best practice for selecting number of mixture model components

samvaughan · May 11, 2020, 6:01am

Hi,

I have a number of datasets with values constrained to lie between 0 and 1. I’ve set up a mixture model of two beta distributions which fits well in Stan. However I’d like to quantitatively decide in each case whether to use one, two or three mixture components- there are physical reasons why we might expect to see two beta distributions in some data sets but only one in others. What’s the best practice for choosing the number of mixture components to fit?

I’ve thought about fitting one, two and three components in every case and then choosing the model with the best PSIS-LOO score (for example). However I’ve also seen the idea here of fitting a model with a large number of mixture components and a regularising prior to force a concentration to fewer components. Is there a reason to prefer one technique over the other? Or is there another way to select the “best” number of mixture components?

Thanks very much!

emiruz · May 11, 2020, 7:59am

Hey @samvaughan welcome to the community. I’d suggest a PSIS / Loo based model comparison approach might be appropriate in your case. This thread I think closesly follows your own requirements and should be instructive.

Model averaging, mixture models and model selection are three different methods but all seem all seem to overlap somehow in cases like yours. However, they do not lead to the same implications. @avehtari has a paper on this very theme and may be able to add more colour with regards to why.

In my own experience, i’ve found regularised priors to be a very blunt instrument for working with mixtures which are not already very well separated.

samvaughan · May 12, 2020, 1:28am

Thanks @emiruz for your reply! With regards to the paper by @avehtari, do you mean this one from 2016? I’ve also found some of his notes here which look incredibly useful.

emiruz · May 12, 2020, 5:20pm

From memory the paper was some sort of comparison of ways to stack or average models.

Topic		Replies	Views
Diagnostics for Mixture IS leave-one-out cross-validation? Algorithms loo	0	18	July 9, 2025
Modelling mixing proportions/simplex parameters in brms? Modeling brms	5	602	August 21, 2022
Projpred with mixed models Modeling loo	5	720	July 2, 2019
How to decide on the best model? Modeling loo	3	601	December 3, 2020
Problematic posterior fit for 2-component Gaussian mixture model Modeling	1	477	August 10, 2019

Best practice for selecting number of mixture model components

Related topics