Hi,

I am reading this article on mixture models : https://mc-stan.org/users/documentation/case-studies/identifying_mixture_models.html .

The advice there is to use likelihood functions in which the discrete variables are not sampled but are “marginalized out”.

I think I am not getting here something ( important ? ) : why does it even cross anybody’s mind to use a likelihood function in which the discrete variables are not marginalized out?

Why would that ever be useful ?

Maybe for “semi supervised learning” ? - but even then… if the classes are partially observed… then there is no need to sample them because they are already known :)

I cannot imagine what it would mean to use a likelihood function for a mixture model where the discrete variables are not marginalized out.

I think I am not getting something very basic. Could someone please enlighten me ?

I mean it’s not that the distribution density is discrete but the variable itself, and we sample from the parameters which are continous always, right ? So I don’t really get this discrete issue problem.

Doing MC for Ising model is a different issue because the energy depends on discrete variables. However here AFAI understand the energy only depends on continous variables (the parameters).

Could someone please give a very simple example where it is a good idea to sample from discrete distributions ? Can they not always be marginalized out ? ( Making a for loop, or two, or three ? )

Cheers,

Jozsef