PDF for probability of selection without replacement (i.e., N choose K as likelihood)?

Say your DGP looks like a game show — the host awards two points each round to three contestants based on player skill in some challenge. Is there a PDF for this? I.e., looking for the probability of selection for each element given \begin{pmatrix} N \\ K \end{pmatrix}. I derived one for the simple \begin{pmatrix} 3 \\ 2 \end{pmatrix} case, but it gets pretty unruly if you expand to larger values of N/K.

Hi, @markjrieke. Sorry this didn’t get answered sooner.

You’ll have to be more specific bout what you mean by awarding two points each round based on player skill. Is there randomness, or just the players with the top three skills get points?

if you want to set it up as a logistic regression like the Bradley-Terry model, where skill is a log odds of winning, you can use the Plackett-Luce distribution. I wrote about that in a case study here:

There’s a pdf (the kind you view) checked in with my slides from StanCon. Others have also coded this model like Bruno Nicenboim:

https://bruno.nicenboim.me/2021/03/21/a-simple-way-to-model-rankings-with-stan/

and Rasmus Bååth:

Based on the title of your post, the distribution that you seem to describe is the hypergeometric distribution. If it’s not in Stan yet, it should be easy to code up based on a definition on e.g. Wikipedia

Hey @Bob_Carpenter — no worries! Appreciate the follow up. A quick scan through the docs you linked & @LucC mention & I think that the hypergeometric distribution describes what I’m after (although the Bradley-Terry model gives name to what I was using elsewhere in the model implicitly!). I’ll read in more detail later this week.


For some clarification, here was the simple derivation I came to for the \begin{pmatrix} 3 \\ 2 \end{pmatrix} case. Points are awarded probabilistically, with more skilled players being more likely to win a point. In each round, players can earn a max of one point, but two in total are awarded. If I pretend that the points in each round are awarded sequentially, then any point in the graph that ends with Player 1, for example, represents that player winning a point. Multiplying the paths together gives:

\begin{align*} \text{Player 1 Point} &\sim \text{Bernoulli}(\Theta_1) \\ \Theta_1 &= \theta_1 + \theta_2 \frac{\theta_1}{\theta_1 + \theta_3} + \theta_3 \frac{\theta_1}{\theta_1 + \theta_2} \end{align*}

Where the vector \theta is given by the softmax of player skill parameters, \beta. Because two points are awarded in total, \sum \Theta = 2.

I’m afraid I still don’t understand the rules from either this post or what you link. It sounds like the Plackett-Luce model, but I’m not 100% sure. The reason I say that is the trees you draw look similar, where there is a probability simplex over all of the players winning the first point, then the second point tosses out the first player and renormalizes the simplex.

That is, the generative process looks like this if the player probabilities of winning are represented by a simplex \theta \in \Delta^{N - 1} \subset \mathbb{R}^N that is assumed to be stable for subsequent choices (that’s the Luce axiom from which the model draws its name).

def sample_two_from_plackett_luce(theta):
    # theta is assumed to be a simplex with len(theta) >= 2
    N = len(theta)
    first = np.random.choice(N, p=theta)
    remaining_indices = np.delete(np.arange(N), first)
    remaining_theta = np.delete(theta, first)
    remaining_theta /= remaining_theta.sum()
    second = np.random.choice(remaining_indices, p=remaining_theta)
    return first, second

It’d probably be more efficient to rejection sample in the two-choice case, but this should get the point across.

Note: the hypergeometric distribution describes the probability of drawing a random number of items from a population without replacement. It does not tell you which items exactly (which are assumed exchange able). Like Bob, I’m not entirely clear on what you’re after, and went by the title of the post.

Are you looking to model the distribution of

  • Points (per round or total)?
  • Number of individuals getting points?
  • The exact persons getting points, based on their skill level?
  • something else?

Yeah this should just be Plackett-Luce, I believe. You can write a PDF of this if you’re fine with a recursive definition, which is what your tree diagrams imply, in any case. If we assume there is a latent ordering, we can look at the probability that a given player is in the top K.

Let i \in \{1,\dots, N\} index players with \beta_i representing “skill” for i. Let:

s(i, J) = \frac{\mathrm{exp}(\beta_i)}{\sum_{j\in J} \mathrm{exp}(\beta_j)}

Where J \subseteq \{1, \dots, N\}. So s(i, J) is the softmax probability of player i winning from an index set of players J (we’ll assume always that i \in J). Next, let \pi(i, K, J) be the probability that player i is in the top K of index set J. We can decompose this as follows:

\pi(i, K, J) = s(i, J) + \sum_{j \not = i} s(j, J) \pi(i, K - 1, J\setminus \{j\})

The first term in the sum is the probability that i is the top player. The second term is the probability that i is in the top K - 1 of the remaining players after some player has won. Each summand over j is the probability that j is the top player multiplied by the probability that i is in the top K - 1 of the index set J \setminus \{j\}. We can recognize the base case of:

\pi(i, 1, J) = s(i, J)

It’s fully defined, but as N and K grow, you’ll have lots of sums and I believe there are efficient implementations in some of the materials @Bob_Carpenter linked.

Here’s my Plackett-Luce density definition for Stan:

real plackett_luce_lpmf(array[] int y, vector beta) {
    vector[size(y)] beta_y = beta[y];
    return sum(log(beta_y) - log(cumulative_sum(beta_y)));
}

Sorry there’s no error checking—it’s assuming that size(y) == rows(beta). y is sorted into ascending order and beta is a simplex representing choice probabilities over the full set. I have the ratings sorted in ascending order so that I can express this with a cumulative sum.

The repo is at the link posted above.

Sorry I realize I made a mistake in my response.

I’m pretty sure what @markjrieke is describing is not purely Plackett-Luce, i.e. a distribution over full rankings, but rather a distribution of whether an individual is above some threshold in a ranking. So in his general case, he is asking what is the probability that a given player is in the top K ranks from a total of N players.

This should be a sum over all ranking configurations that have the player with \mathrm{rank} \leq K where the probability of each particular full ranking configuration is given by the Plackett-Luce distribution.

I believe the recursive form I wrote above captures the sum over those configurations (it agrees with @markjrieke in the N=3,K=2 case.

This is a standard application of Plackett-Luce. Plackett’s paper is about just this problem. This is the first sentence of the paper.

IN 1965 I was consulted by a bookmaker with the following problem. When there are 8 or more runners in the field, a punter may bet that a specified horse will be placed, i.e. finish in the first three.

1 Like

Thanks for correcting my misunderstanding

Thank you all for the resources & discussion! Marking as answered