Hi everyone, I’m new here so I’m not too sure how to phrase this question (or where to put it). But I am fitting multiple two component Gaussian mixture models to genetic data. I first used a vanilla EM algorithm (MClust library) and then I mirrored the user manual to write up a similar model in stan. I assumed that the models would be extremely similar, but the Stan model seems to separate the mixtures much more than the EM algorithm (whenever the components are not obvious). I attached my code and a case to illustrate the difference. The graph labeled stan is the stan model and the bottom graph is the generic EM. Does anyone know why this is the case? Improvements on my simple code is also appreciated.
data {
int<lower = 1> N; //number of genes
int<lower = 1> J; //number of experiments
matrix[N,J] y; //data
} parameters {
simplex[2] theta[N]; //mixing proportions
ordered[2] mu[N]; //locations of mixture components
vector<lower = 0>[N] sigma; //sdevs
} model {
vector[2] log_theta[N];
sigma ~ lognormal(0,2);
for(n in 1:N) {
log_theta[n] = log(theta[n]);
mu[n] ~ normal(0, 2);
for(j in 1:J) {
vector[2] lps = log_theta[n];
for(k in 1:2) {
lps[k] = lps[k] + normal_lpdf(y[n,j]| mu[n][k], sigma[n]);
}
target += log_sum_exp(lps);
}
}
}