Mixture of Normals as distribution for latent variable

KostasKalog · April 19, 2021, 10:45am

Hi,

First of all congratulations for creating this wonderful community!

I have a quick question. Is it possible to use normal mixtures in Stan as distributions for latent variables? In all the examples I have seen so far, normal mixtures are directly assigned to data.

For example, how about something like y_i \sim \text{Poisson}(\mu+\epsilon_i), (with y_i being observed) where the distribution of each \epsilon_i follows a normal mixture?

martinmodrak · April 27, 2021, 7:08pm

Hi,
Stan only very rarely privileges continous (real, vector, etc.) data over parameters, so most pieces of code that admit data will also admit parameters (there are a few exceptions, but log_mix / log_sum_exp are not one of them) - so in principle you could treat the mixture exactly as you would a mixture in the data.

In practice however, Stan doesn’t work well with multimodal posteriors, so unless you enforce that the mixture is not multimodal (e.g. by having a mixture of two distributions that share mean and differ only in scale), the model is unlikely to work that well.

But the case you’ve shown (and I would guess most others) can be directly translated to mixtures of observed data,i.e. you can have an equivalent model via:

y_i \sim \mathrm{Mix}(\theta, \mathrm{Poisson}(\mu + \epsilon_{i, 1}), \mathrm{Poisson}(\mu + \epsilon_{i, 2}) \\ \epsilon_{i, 1} \sim N(a_1, \sigma_1)\\ \epsilon_{i, 2} \sim N(a_2, \sigma_2)\\

Which should be pretty well behaved for the most part.

Best of luck with your model!

KostasKalog · April 27, 2021, 10:53pm

many thanks for the equivalent model, seems interesting I will definitely try it!

About Stan not working well with multimodal posteriors, I am wondering whether this can be addressed by several parallel chains and post-processing of the MCMC output.

martinmodrak · April 29, 2021, 3:01pm

In theory: yes. In practice it IMHO rarely works. Most multimodal posteriors actually exhibit weird curvatures that break sampling (e.g. give you divergent transitions) - for the most part this is a good thing as it lets you notice multimodality even if you were not aware of it. Additionally, it is very hard to ensure that your chains actually visited all the modes as the “attraction basin” of a mode (e.g. from which initial conditions it is likely to be found) may be relatively small even if the mode in fact should contain substantial amount of posterior mass. Most models just tend to work much better when you remove the multimodality in some way.

KostasKalog · April 29, 2021, 7:59pm

I see your point, many thanks

Topic		Replies	Views
Mixture of Gaussian distributions for the random effects Modeling	6	1701	August 28, 2017
Variable selection mixture model in Stan Modeling	3	343	May 11, 2021
Mixture model: Different perspective to the schools example Modeling	35	1251	April 2, 2020
Modeling multi modal distribution Modeling fitting-issues	3	638	March 19, 2023
Mixture Models with heterogeneous mixtures Modeling	2	314	November 17, 2023

Mixture of Normals as distribution for latent variable

Related topics