Hi there,
I need advice on modeling my use case. Lets assume you have a population of X items and you have N trials. Each item i from the overall population is selected/sampled to be evaluated with probability p_{i}. We select Y out of X based on those p_{i} values (similar to importance sampling and so we can assume w_{i} = 1 / p_{i}). Now, for each item that we have in the experiment we get a label that could be zero or one. Given that we have received N positive labels out of Y trials, we want to infer the probability of success (seeing positive label) for the overall population of size X.
If all Y items were selected similarly, i.e with probability 1/X, then we simply had a BetaBinomial model, sth like the following and we could say that p would be the success rate for the overall population.
model {
p ~ Beta(1, 1);
N ~ Binomial(Y, p);
}
My question is that how we can model this and infer the overall success probability given that each item comes with a different weight. Note that the size of data is big (X is in the order of billion and Y is in the order of 100K, so the model should be scalable and sth like PoissonBinomial is not an option)
Thanks in advance for your help