I’m trying to fit a mixture model to data below.
data.R (55.0 KB)
This is what I’ve got so far. How can I improve even more? I’m not sure what to do on the left part of the plot. Any help is appreciated.
data {
int <lower = 0> N;
vector[N] y;
}
parameters {
ordered[2] mu;
real <lower = 0> sigma[2];
real <lower = 0, upper = 1> theta;
}
model {
sigma ~ normal(0, 0.5);
mu[1] ~ normal(0, 2);
mu[2] ~ normal(4, 1);
mix_weight ~ beta(5, 5);
for (i in 1:N) {
target += log_mix(theta,
normal_lpdf(y[i] | mu[1], sigma[1]),
normal_lpdf(y[i] | mu[2], sigma[2]));
}
}
Your data looks like it is truncated. Maybe modeling the truncation would improve the fit
target += log_mix(theta,
normal_lpdf(y[i] | mu[1], sigma[1]) - normal_lccdf(0| mu[1], sigma[1]),
normal_lpdf(y[i] | mu[2], sigma[2]));
1 Like
Hey @xhackerz, welcome! To me your data doesn’t look Gaussian (normal). First thing I notice is that your data is positive only; and that it looks spiky on the left. Also, I guess mix_weight is suppose to be theta.
Try a mixture of gamma distributions instead. If you have a look at the wikipedia article, it looks to me like you can reparameterise a gamma to use mean and variance, so then you can use your ordering trick like you did with the normals. E.g.
\mu=\frac{\alpha}{\beta}, \sigma=\frac{\alpha}{\beta^2}, so \beta=\frac{\mu}{\sigma},\alpha=\frac{\mu^2}{\sigma}
If youre luckly a mixture of 3 gamma’s may also account for that big spike you see on the left too. Here’s a 2 component mixture example to get you started. I haven’t run it but it’s something.
data {
int <lower = 0> N;
vector[N] y;
}
parameters {
positive_ordered[2] mu;
vector<lower=0>[2] sigma;
vector<lower=0, upper=1> theta;
}
transformed parameters {
vector<lower=0>[2] alpha= mu .* mu ./ sigma;
vector<lower=0>[2] beta= mu ./ sigma;
}
model {
sigma ~ normal(0, 1000);
mu ~ normal(0, 1000);
for (i in 1:N) {
target += log_mix(theta,
gamma_lpdf(y[i] | alpha[1], beta[1]),
gamma_lpdf(y[i] | alpha[2], beta[2]));
}
}
Forgive any mistakes; I wrote this super quickly. Hope it helps.
2 Likes
Yeah, it could be a gamma instead of a truncated normal.
I think you meant to have gamma_lpdf
in both mixture components instead of that normal_lpdf
in the second. EDIT: fixed
2 Likes
i’m trying to run this model but i got
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: vector ./ real[ ]
at real<lower=0> alpha[2]= mu .* mu ./ sigma;
Can’t mix arrays and vectors. mu
is vector because it’s ordered
so make sigma
a vector too. Also I think mu
has to be positive.
positive_ordered[2] mu;
vector<lower = 0>[2] sigma;
1 Like
Change the type of mu to “positive_ordered” and the type of all the other arrays to “vector”. As the error indicates, real[] look like they are not supported in elementwise operations