Random Question: Stan Model vs EM Algorithm

Hi everyone, I’m new here so I’m not too sure how to phrase this question (or where to put it). But I am fitting multiple two component Gaussian mixture models to genetic data. I first used a vanilla EM algorithm (MClust library) and then I mirrored the user manual to write up a similar model in stan. I assumed that the models would be extremely similar, but the Stan model seems to separate the mixtures much more than the EM algorithm (whenever the components are not obvious). I attached my code and a case to illustrate the difference. The graph labeled stan is the stan model and the bottom graph is the generic EM. Does anyone know why this is the case? Improvements on my simple code is also appreciated.

data {
  int<lower = 1> N; //number of genes
  int<lower = 1> J; //number of experiments
  matrix[N,J] y; //data
} parameters {
  simplex[2] theta[N]; //mixing proportions
  ordered[2] mu[N]; //locations of mixture components
  vector<lower = 0>[N] sigma; //sdevs 
} model {
  vector[2] log_theta[N];
  
  sigma ~ lognormal(0,2);
  
  for(n in 1:N) {
    log_theta[n] = log(theta[n]);
    
    mu[n] ~ normal(0, 2);
    
    for(j in 1:J) {
      vector[2] lps = log_theta[n];
      
      for(k in 1:2) {
        lps[k] = lps[k] + normal_lpdf(y[n,j]| mu[n][k], sigma[n]);
      }
      
      target += log_sum_exp(lps);
    }
  }
}

If you are using the (default) MCMC algorithm, then it is no surprise that posterior means / medians are different from the deterministic maximum of EM. Also, since Stan’s default MCMC algorithm actually samples the tails of a distribution if all goes well, it is even less of a surprise.

1 Like

mclust has many different mixture models. For example the (co)variance of dimensions can be fixed or free across components. Which of the mclust models is your stan code emulating?
Your model seems to assume that the variances for each dimension is constant across mixture components, whereas the covariance between dimensions remains unmodeled.

It’s really full Bayes vs. point estimation you’re comparing, assuming everything’s working properly. You can see the difference within Stan by using optimization—that should give you the same answer as EM if the EM is working and the models are the same.