Mixture of Gaussian functions

Dear Stan users,

I would like to determine the parameters of a mixture of Gaussian functions. To illustrate this, I simulated a mixture of two normal functions (in R):
x1 <- seq(-5, 6, 0.01)
d1 <- dnorm(x1, -1, 1)
d2 <- dnorm(x1, 2, 1.5)
sum1 <- d1 + d2
plot(x1, sum1, type = “l”)

Next, I would like to determine the means and scales of these two distributions. One option might be to code the Gaussian density as function in Stan but does anyone know whether there are faster alternatives?

Any help would be greatly appreciated.

Best wishes,

This is extensively discussed in the manual. See also http://mc-stan.org/documentation/case-studies/identifying_mixture_models.html.

Dear Michael,

That’s indeed a very useful case study! My questions is currently however not about a mixture of data originating form >1 gaussian distribution but about a mixture of the gaussian functions themselves (sorry for being unclear about this). In other words my data comes from the ®-command dnorm and not from rnorm.

Best wishes,

Mathematically (or philosophically if you’d prefer) your data cannot be density values as densities are not well-defined measure theoretic objects. In other words, I am positing that what you are doing is not a well-posed statistical model.
In particular, there is no well-posed way to extract means and variances from those density values.

OK. But in the example I gave my data can be: d1 + d2 + rnorm(mu, sigma). Where d1 and d2 are two gaussian functions with some measurement error which is incorporated in rnorm. An example of this might be the optical spectrum of a comet. Furthermore, Stan is able to determine the correct mean and scale of d1 and d2 with a gaussian function (f(x) = 1/(s * sqrt(2 * pi)) * exp((-1 * (x - m)^2)/(2 * s^2))) as well as the error term rnorm(mu, sigma). Could you please explain what is wrong with doing this?

I think you’re trying to fit what is typically called a non-linear least squares model, not a mixture model.

That should be straightforward in Stan. Something like (untested):

functions {
    real phi(real x, real mu, real sigma){
        return 1 / (sigma * sqrt(2 * pi())) * exp(-(x - mu)^2/(2 * sigma^2)); 
data {
parameters { 
    ordered[2] m; 
    real<lower=0> s[2];

    real<lower=0> sigma; 
model {
    // Priors

    // Likelihood
    for(i in 1:N){
        y[i] ~ normal(phi(x[i], m[1], s[1]) + phi(x[i], m[2], s[2]), sigma); 

There’s a lot you can do to speed this up (vectorizing phi() is the obvious thing) and I’d imagine the posterior is a bit difficult, but this should get you started.

1 Like

Let’s go with your comet analogy. You have some latent optimal spectrum modeled with a Gaussian which is then convolved with measurement variation, right? At no point are your data “densities” or convolutions thereof. This is perhaps easiest to see if you try to simulate data –

s ~ normal(mu_s, sigma_s);
s_tilde ~ normal(s, sigam_measurement);

The latent spectral observation is a line, not a density. Then that sampled line is varied in the measurement process. These models are discussed in the Measurement Error chapter of the Stan manual.

That may very well not be to what you are referring, in which case you’ll have to clarify. And in that case I would still encourage you to consider the actual data generating process.

Dear Michael and Weylandt,

You are both exactly right and sorry for being unclear about this. Taking the code of Weylandt as giving, my final question is how I can speed up the code of this model? Vectorizing phi in the example of Weylandt is straightforward. Are there other options?

Best wishes,

There are a few ways to improve the program that @weylandt suggested, namely using normal_lpdf directly instead of defining the density by hand and creating an intermediate vector for the mean values and then using a vectorized likelihood call, but the corresponding model still doesn’t make much sense.

Dear Michael,

I attached a reproducible example which is not not very efficiently coded but will run in seconds (indentation was lost when pasting it below, sorry). Do you think I can use normal_lpdf directly (instead of using the function phi) to determine the parameters m1, m2, s1, s2 and sigma in the model?

#simulate some data
x1 <- seq(-5, 6, length.out = 150)
d1 <- dnorm(x1, -1, 1)
d2 <- dnorm(x1, 2, 1.5)
sum1 <- d1 + d2 + rnorm(length(x1), 0, 0.05)
plot(x1, sum1, type = “l”)

data1 <- list(x = x1,
y = sum1,
N = length(x1))

#Stan file
//stan file
real phi(real x, real m, real s){
return 1 / (s * sqrt(2 * pi())) * exp(-(x - m)^2/(2 * s^2));
int<lower=0> N;
real y[N];
real x[N];
real m1;
real m2;
real<lower=0> s1;
real<lower=0> s2;
real <lower=0> sigma;
//prior, inefficiently scripted
target += normal_lpdf(m1 | 0, 2);
target += normal_lpdf(m2 | 0, 2);
target += normal_lpdf(s1 | 0, 1);
target += normal_lpdf(s2 | 0, 1);
target += cauchy_lpdf(sigma | 0, 1);

for (n in 1:N){
target += normal_lpdf(y[n] | phi(x[n], m1, s1) + phi(x[n], m2, s2), sigma);

Best wishes,

Except that you’re working on the linear scale with function phi() and normal_lpdf works on the log scale.


phi(x[n], m1, s1) + phi(x[n], m2, s2)

can be coded as

exp(log_sum_exp(normal_lpdf(x[n] | m1, s1), 
                normal_lpdf(x[n] | m2, s2))

But I have no idea why you’d be using the sum of two normal desnities as a location parameter for a normal.

Writing the priors out with ~ notation will be more efficient because it will drop constants.

Dear Bob,

Many thanks! Works perfect! I’ll mark this item as solved.

Best wishes,