Modelling player strengths based on multiple observed rankings

fhu · May 4, 2024, 11:34am

There’s a group of K players, they play N games and for each game we observe their ranking. The goal is to estimate individual player strengths.

The player strengths are S_k \sim N(0, 0.5). In a game n, player k gives a performance P_{nk} \sim N(S_k, 1). The ranking is determined by the player performances.

The players which participate in a given game can be any subset of the K players.

An example dataset is: there are 3 players, 3 games, and the observed rankings are 312, 12, 132.

How would you model this with Stan?

jsocolar · May 30, 2024, 6:36pm

This question boils down to: what is the likelihood of an observed ordering of a vector of independent normal variates with potentially different means?

It turns out that this question has a somewhat tractable answer based on the multivariate normal CDF; see here combinatorics - Compute probability of a particular ordering of normal random variables - Mathematics Stack Exchange

However, Stan does not have a multivariate normal CDF function, and in general it’s a hard one to compute. Some progress in this direction in Stan has been made e.g. here: Multivariate normal CDF and here Multivariate normal cdf (those posts have the same title but are different posts). Perhaps @spinkney or @martinmodrak has more to say about this?

fhu · May 30, 2024, 9:00pm

I’ve solved it with JAGS and dinterval.

# Specify that performances[1] > performances[2]
one ~ dinterval(performances[1], performances[2])

jsocolar · May 31, 2024, 1:15am

That’s a clever approach based on latent variables! The dimensionality of the auxiliary parameters is quite large–one for each player-x-game combination, and as such Stan might be a particularly good tool for estimation. You can use Stan’s ordered type to create a set of latent performance scores that is bound to respect the ordering of the outcome, and then sample each of these ordered vectors from independent normal distributions with appropriate mean vectors.

I haven’t actually verified analytically that this works to give the right likelihood, but intuitively it feels like it should work. If you’re confident that the dinterval solution works, then I’m pretty sure this must work too.

If there’s a good way to compute the multivariate normal CDF in Stan, you can use it to marginalize out all of these latent variables, which would probably yield gains in computational performance.

martinmodrak · June 6, 2024, 9:02am

Note also that there are families of distributions specifically for ranking data, most notable the exploding logit (see e.g.: A simple way to model rankings with Stan - Bruno Nicenboim). There’s also a bunch of published papers on Bayesian modelling of various racing sports which typically focus on modelling rankings and player strengths.

Topic		Replies	Views
Identifying Non-Identifable Latent Positons Modeling	1	79	July 15, 2024
Help with Bayesian Modelling Modeling rstan , prior-choice , priors , initialization	6	269	July 4, 2024
Need help building a two-stage model (beginner) Modeling	12	1005	April 6, 2018
Time series modelling: intermediate values Modeling	9	1357	August 30, 2017
Stan model to fit ranks to match data like Bradley & Terry (1952)? General example-models	3	1315	May 21, 2018

Modelling player strengths based on multiple observed rankings

Related topics