Hi,

I have developed the following mixture model and my application is to infer the class distribution from the thresholds created by some upstream binary classification models. The problem I am having is that the model seems to be working for the populations where the mixing distributions are around 20%-80%, but not as well when the mixing distributions are 2%-98% (i.e., that is skewed mixtures).

I am looking for some directions for modeling with STAN when the mixing distribution is skewed. Thanks.

Here is my model code:

data {

int<lower=0> J; // number of cases

real scores[J]; // score of each transaction

real mu_s[2];

real <lower=0> sigma_s[2];

vector<lower=0>[2] alpha;

}

parameters{

simplex[2] theta;

real conj_mu_s[2];

real conj_sigma_s[2];

}

```
model {
conj_sigma_s[1] ~ gamma(.5, sigma_s[1]);
conj_sigma_s[2] ~ gamma(.5, sigma_s[2]);
conj_mu_s[1] ~ normal(mu_s[1], conj_sigma_s[1]);
conj_mu_s[2] ~ normal(mu_s[2], conj_sigma_s[2]);
theta ~ dirichlet(alpha);
for (n in 1:J){
real gamma[2];
for (k in 1:2){
gamma[k] = log(theta[k]) + normal_lpdf(scores[n] | conj_mu_s[k], conj_sigma_s[k]);
}
//increment_log_prob(log_sum_exp(gamma)); // likelihood
target += log_sum_exp(gamma);
}
}
```

Here is a figure when I think it is not working (overestimating). Gray is the STAN output, Red is truth, blue is an alternative method.