I am reading this article on mixture models : https://mc-stan.org/users/documentation/case-studies/identifying_mixture_models.html .
The advice there is to use likelihood functions in which the discrete variables are not sampled but are “marginalized out”.
I think I am not getting here something ( important ? ) : why does it even cross anybody’s mind to use a likelihood function in which the discrete variables are not marginalized out?
Why would that ever be useful ?
Maybe for “semi supervised learning” ? - but even then… if the classes are partially observed… then there is no need to sample them because they are already known :)
I cannot imagine what it would mean to use a likelihood function for a mixture model where the discrete variables are not marginalized out.
I think I am not getting something very basic. Could someone please enlighten me ?
I mean it’s not that the distribution density is discrete but the variable itself, and we sample from the parameters which are continous always, right ? So I don’t really get this discrete issue problem.
Doing MC for Ising model is a different issue because the energy depends on discrete variables. However here AFAI understand the energy only depends on continous variables (the parameters).
Could someone please give a very simple example where it is a good idea to sample from discrete distributions ? Can they not always be marginalized out ? ( Making a for loop, or two, or three ? )