I was wondering if there is an efficient way to model finite mixture models (e.g., Gaussians) using binned/grouped data? By binned data, I mean (x = 1,2,…etc, y = 15, 169, … etc) where x are the values, and y are the corresponding number of samples for each value. The examples in the manual work great for individual observations where the observations are fed as a single vector rather than aggregated. The problem is that I have a lot of bins and a lot of observations in each bin. Expanding the data will create a very large vector 100K of individual data points and it will be very slow in Stan. I was able to fit my data using the R package mixR (https://cran.r-project.org/web/packages/mixR/vignettes/intro-to-mixr.pdf) which allows for mixture models to be fit using binned data. The computational gain was huge compared to trying to feed the data as a single vector. However, the likelihood used in mixR for the binned data was specific to expectation maximization that the package uses and I was unable to convert it to Stan. Any help will be appreciated as I have been struggling with this for a few days now. I hope this is clear, and I can provide some code if it helps.

If I understand correctly, with x = 1, 2, 5, ... and y = 15, 169, ... then you have 1 observation of value 15, 2 observations of value 169, 5 observations of value 169, etc. This is easy to deal with because you can write down the mixture log likelihood then just multiply it by the number of observations.

Concretely, it looks like this (modulo typos):

data {
int<lower=0> N; // observations
array[N] int x;
array[N] real y; // observe y[n] a total of x[n] times
int<lower=1> K; // mixture components
}
parameters {
vector<lower=0>[K] sigma;
vector[K] mu;
simplex[K] lambda; // mixing proportions
}
model {
// priors
sigma ~ ...
mu ~ ...
lambda ~ ...
// likelihood
vector[K] log_lambda = log(lambda);
for (n in 1:N) {
vector[K] lp = log_lambda;
for (k in 1:K) {
lp[k] += normal_lpdf(y[n] | mu[k], sigma[k]);
}
real log_lik_y_n = log_sum_exp(lp); // log likelihood of single observation of y[n]
target += x[n] * log_lik_y_n; // scale to x[n] observations of y[n]
}
}

Oops, forgot to answer this. I’m guessing their E-step has the marginalization in the same form I used buried inside somewhere. Usually, you can just translate the E-step of an EM algorithm to define a likelihood in Stan. I did that in the user’s guide for mixture models, LDA, HMM, the CJS model (really just a kind of HMM), data coding models (kinds of mixtures), change point models (latent discrete models, but arguably also a kind of mixture).

That’s what’s going on above. In fact, if you put the model I included into our optimizer, the result is what’s known as “generalized EM”. The generalization is to allow the M-step to hill climb rather than maximize.

Thank you very much for such quick response! Yes, this is exactly what I was looking for. No worries about the EM, I was just looking for others who have looked into fitting a finite mixture to grouped data. I much rather have it in Stan because I am hoping to incorporate this into a CAR model.