A different kind of non-central hypergeometric distribution?

Continuing the discussion from Constraining Output of Multinomial Distribution:

Over time, I have realized that the Wallenius non-central hyoergeometric distribution is not the correct distribution for my needs. The reason is: I do not believe the probability of selection depends upon the number of items available.

An example that I constructed that enumerates the type of problem I am trying to solve:

Assume we have c different t-shirts, each with a different probability of being purchased, p_i, i=1...c. For each t-shirt, we have m_i, i=1...c copies of it.

Now, a customer is shown all of the t-shirts we have in stock (that is, each t-shirt where m_i > 0). They purchase one according to the probabilities of the shirts that are available. Then this t-shirt is selected, m_i is decremented by one and the next customer is shown the t-shirts.

This example that I listed seems to be a more “categorical sampling at each realization” where the probabilities only change if we “stick out” of an item.

Does this correspond to a Fisher non-central hypergeometric distribution? If not, does anyone know of a non-central hypergeometric distribution that satisfies this type of sampling?

If so, and c is large, does anyone have insight as to how to compute the denominator (the P_0) in the Fisher distribution?

1 Like

I am a bit puzzled as what you describe sounds exactly as simple sampling without replacement which was already suggested in the original thread (Constraining Output of Multinomial Distribution - #3 by LucC). Could you explain once again why that would not work for you?

Best of luck with your model

Hey Martin. The difference is that the probability of selection is different across the items, which is not accounted for in the multivariate hypergeometric distribution – there, each probability is uniform and only depends upon the number of “balls in the urn.” For my case, the probabilities do not depend upon the number of items total, only that there is at least one item, but they do differ across the items.

So the way I am modeling it is now to use the Wallenius noncentral hypergeometric distribution but under the assumption that there is a latent probability for each “ball”, p_i, i=1,...,k.. Then we define the “weight” of the distribution as omega_i = p_i / m_i, where m_i denotes the number of balls of color i,
so that the probability of drawing a ball is not dependent upon the number of items in the urn.

I don’t know if this is correct, but I have been unsuccessful in trying to derive the likelihood for a “sequential categorical distribution.”

Again, if anyone has any thoughts on this, I would be super appreciative!

1 Like

So just to be clear we are discussing the same thing - would the following R code represent simulations from the distribution you have in mind?

sample_this <- function(K, counts, probs) {
  N_categories <- length(counts)
  res <- rep(0, N_categories)
  for(k in 1:K) {
    new_item <- sample(1:N_categories, size = 1, prob = probs)
    res[new_item] <- res[new_item] + 1
    if(res[new_item] >= counts[new_item]) {
      probs[new_item] <- 0
      # No need to renormalize probs, `sample` does it internally

sample_this(15, counts = c(1,5,10,30), probs = rep(0.25,4))

sample_this(3, counts = c(1,1,1,1, 2), probs = c(0.01,0.01,0.01,0.01,0.96))