I am working on an unusual finite mixture model to try to tackle an historical source. The problem combines a normal 2 component finite mixture with some additional information, and I am unsure how to incorporate the additional information into my model. I want both to estimate the means of the mixtures, and to get posterior probabilities of which elements belong to which mixture. I’m posting here to beg for smart ideas.
I have an historical source which provides me with the following information: there are N companies indexed i divided into G groups. The companies are of two types z \in \{0,1\} and associated with each company is an outcome y_{ig} for which I strongly believe that y_{ig}^{(1)} \sim N(\mu_1, \sigma_1) and y_{ig}^{(0)}\sim N(\mu_0, \sigma_0) with \mu_1 > \mu_0. In other words a mixture of normals.
For each group g the source tells me how many companies are of type 0 (n_0) and how many are of type 1 (n_1). Thus within each group g \in G I know the mixing probability p_g = \frac{n_{1g}}{n_{0g} + n_{1g}} (really I’d rather say I have a very strong prior on \lambda_g \sim N(p_g, \sigma_p) as historical sources are error prone, and I’ll do that a bit further down).
So far, this is a straightforward finite-mixture problem with known mixing probabilities as is well-described in the user guide on finite mixtures as well as in the great case study by Betancourt on identifying mixture models. There is the small wrinkle of having a mixing parameter per group but that seems surmountable. In Stan I wrote this to look like:
data {
int<lower = 0> N;
vector[N] y;
vector[N] prop; //prior proportions
}
parameters {
ordered[2] mu;
real<lower=0> sigma[2];
}
model {
sigma ~ normal(.1, .1);
mu ~ normal(.3, .1);
for (n in 1:N) {
target += log_mix(prop[n], //mixture prob
normal_lpdf(y[n] | mu[1], sigma[1]),
normal_lpdf(y[n] | mu[2], sigma[2]));
}
}
and on simulated data this appears to work.
However, the source also tells me the total capitalization of each type of company within each group. Thus if a company’s nominal market capitalization is w_{ig} the source reports \sum_{i} z_{ig} w_{ig} = C_g. In words, the source reports information like “for this list of 20 companies, 7 are good, 13 are bad, and the aggregate capitalization of the good companies is 2 million”. Clearly, this is useful additional information and there should be some way for me to exploit it!
For instance, in one case the source reports 1 good company and there is only one company for which w_{ig} = C_g and therefore I can recover the vector of z_i's for that group. It seems reasonable to me that given the vector \mathbf{w_g} and the constraint C_g some elements w_{ig} are more likely to be part of the solution than others. For instance, a company who’s capitalization exceeds the constraint obviously is excluded, and one that is large would eliminate lots of other solutions thus making its inclusion unlikely. Googling suggests this is what computer scientists call the ‘subset sum’ problem, but so far that information hasn’t helped me find papers that estimate the probability that an element is a component of the solution.
One way I tried to tackle the problem was to create a length-N parameter \lambda_{ig} which I draw centered on the reported mixing probabilities p_g. I then add to the model the constraint that C_g = N(\mathbf{\lambda_g}^T\mathbf{w_g}, \sigma_C) where I just pick a plausible value for \sigma_C. Basically, trying to get a better estimate of \lambda_{ig} by forcing it to approximately solve the constraints. My efforts are below:
data {
int<lower = 0> N; // n obs
int<lower = 0> G; // n groups
int gs[G]; // vector of group sizes e.g. [20, 13, 22, ...]
vector[N] y; // outcomes
vector[N] w; // characteristics
vector[N] prop; //prior proportions
vector[G] C; // prior constraints
}
parameters {
ordered[2] mu;
real<lower=0> sigma[2];
real<lower=0,upper=1> lambda[N];
}
model {
int pos; // to cope with ragged group indexing
sigma ~ normal(.1, .1); // small variance prior
mu ~ normal(.3, .1); // tight mean prior
lambda ~ normal(prop, .1); // truncated normal around the reported proportion
pos = 1; // start indexing by position
for (g in 1:G) { //for each group, the constraint is normally
//distributed around the weighted sum
C[g] ~ normal(dot_product(to_vector(segment(lambda, pos, gs[g])),
segment(w, pos, gs[g])), 100[bayes_mixture.R|attachment](upload://vRKxt6hyYsB79i30PE5cvaJ015z.R) (1.7 KB) [bayes_mixture.stan|attachment](upload://7J16CTvZJd05ISqjBloWWn5KVS9.stan) (1.4 KB) );
pos = pos + gs[g]; // update position
}
for (n in 1:N) { // here we estimate the mixture model for each ob
target += log_mix(lambda[n], //mixture prob
normal_lpdf(y[n] | mu[1], sigma[1]),
normal_lpdf(y[n] | mu[2], sigma[2]));
}
}
generated quantities {
real p1[N]; // here I compute posterior probability that z_i = 1
for (n in 1:N) {
p1[n] = exp(normal_lpdf(y[n] | mu[1], sigma[1]) + log(lambda[n]) -
log_sum_exp(log(lambda[n]) + normal_lpdf(y[n] | mu[1], sigma[1]),
log1m(lambda[n]) + normal_lpdf(y[n] | mu[2], sigma[2])));
}
}
This is not working as I wished it might (although perhaps working as I expected it would!) insofar as the additional constraint doesn’t really seem to result in the individual parameters \lambda_i getting more accurately estimated. The means and variances are estimated accurately:
But in general the individual \lambda_i parameters don’t really depart from their priors as you can see in the figure below where I plot some of the posterior mixing coefficients against the true z_i values:
As you can see, there isn’t alot of variation in the values of \lambda_i and they look pretty much like their prior. This makes me think maybe they are not identified, or I am not parameterizing the problem in a way that is meaningful, or there are too many ways to satisfy the constraint and I am not adding to the model. I thought perhaps I would need to draw z_{ig} given \lambda, but don’t know how to do that in Stan as it would seem to entail a discrete parameter.
Any ideas about how to incorporate this constraint information would be greatly appreciated! I’m also open to any broader thoughts, suggestions, or criticisms of how I am approaching the problem. (Stan and R code attached).bayes_mixture.R (1.7 KB) bayes_mixture.stan (1.4 KB)