Mixture Model Fitting

I’m trying to fit a mixture model to data below.
data.R (55.0 KB)

This is what I’ve got so far. How can I improve even more? I’m not sure what to do on the left part of the plot. Any help is appreciated.

data {
  int <lower = 0> N;
  vector[N] y;
}

parameters {
  ordered[2] mu;
  real <lower = 0> sigma[2];
  real <lower = 0, upper = 1> theta;
}

model {
  sigma ~ normal(0, 0.5);
  mu[1] ~ normal(0, 2);
  mu[2] ~ normal(4, 1);
  
  mix_weight ~ beta(5, 5);

  for (i in 1:N) {
    target += log_mix(theta,
                      normal_lpdf(y[i] | mu[1], sigma[1]),
                      normal_lpdf(y[i] | mu[2], sigma[2]));
  }
}

Your data looks like it is truncated. Maybe modeling the truncation would improve the fit

  target += log_mix(theta,
         normal_lpdf(y[i] | mu[1], sigma[1]) - normal_lccdf(0| mu[1], sigma[1]),
         normal_lpdf(y[i] | mu[2], sigma[2]));
1 Like

Hey @xhackerz, welcome! To me your data doesn’t look Gaussian (normal). First thing I notice is that your data is positive only; and that it looks spiky on the left. Also, I guess mix_weight is suppose to be theta.

Try a mixture of gamma distributions instead. If you have a look at the wikipedia article, it looks to me like you can reparameterise a gamma to use mean and variance, so then you can use your ordering trick like you did with the normals. E.g.

\mu=\frac{\alpha}{\beta}, \sigma=\frac{\alpha}{\beta^2}, so \beta=\frac{\mu}{\sigma},\alpha=\frac{\mu^2}{\sigma}

If youre luckly a mixture of 3 gamma’s may also account for that big spike you see on the left too. Here’s a 2 component mixture example to get you started. I haven’t run it but it’s something.

data {
  int <lower = 0> N;
  vector[N] y;
}

parameters {
  positive_ordered[2] mu;
  vector<lower=0>[2] sigma;
  vector<lower=0, upper=1> theta;
}

transformed parameters {
  vector<lower=0>[2] alpha= mu .* mu ./ sigma;
  vector<lower=0>[2] beta= mu ./ sigma;
}

model {
  sigma ~ normal(0, 1000);
  mu ~ normal(0, 1000);
  for (i in 1:N) {
    target += log_mix(theta,
                      gamma_lpdf(y[i] | alpha[1], beta[1]),
                      gamma_lpdf(y[i] | alpha[2], beta[2]));
  }
}

Forgive any mistakes; I wrote this super quickly. Hope it helps.

2 Likes

Yeah, it could be a gamma instead of a truncated normal.

I think you meant to have gamma_lpdf in both mixture components instead of that normal_lpdf in the second. EDIT: fixed

2 Likes

i’m trying to run this model but i got

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: vector ./ real[ ]

at real<lower=0> alpha[2]= mu .* mu ./ sigma;

Can’t mix arrays and vectors. mu is vector because it’s ordered so make sigma a vector too. Also I think mu has to be positive.

positive_ordered[2] mu;
vector<lower = 0>[2] sigma;
1 Like

Change the type of mu to “positive_ordered” and the type of all the other arrays to “vector”. As the error indicates, real[] look like they are not supported in elementwise operations