Modelling the winner of board games

ldeschamps · October 19, 2020, 7:36pm

Hi!

I have a hard time wrapping my head around a model we’d like to fit.

The objective is to estimate the probability of winning of each member of a board-game group.

The deterministic part of the model would be relatively simple: an intercept per player quantifying the averaged probability of winning, and a series of hierarchical effects: one for game identity, one for the number of players playing.

However, the response is ragged: not every player was present at each game. This leads to another aspect: one player can have a high probability of winning when another is absent, but a lower chance when another is absent. The table would look like :

Would someone have an idea about how to model this? I could create predictors for the presence or absence of a player and fit individual models, but I feel that a multivariate model with all players would be far more interesting (and a statistical challenge for me. We could also use ranks or scores). However, the ragged structure of the response blocks me, and I have no idea about how to formulate a model that could deal with such a structure.

Would anybody have an idea?? The good thing is that we have A LOT of data points for some of the configurations :P

Thanks!
Lucas

Ara_Winter · October 19, 2020, 8:14pm

Maybe something like an occupancy model (from ecology) with predictors and accounts for correlation between players? https://fukamilab.github.io/BIO202/09-C-occupancy-models.html

mitzimorris · October 20, 2020, 1:17am

have you looked at attempts to hack the Elo ratings system for multiplayer games? (everyone agrees that hack is appropriate)

ldeschamps · October 20, 2020, 12:56pm

Hi!

Thank you for your answer, this is a great link (for other courses too :p). I found this pymc thread :

It seems that the “zero-inflation mixture” is unnecessary if you know which 0 are “absence related”, and you can make a brute force mask of the 0 proba for a player to win given he/she is absent (last post).

However, I like the mixture model approach proposed in the 4th post. It is unlikely to work in my case I think because : I do not have all combinations of player represented, and I am not patient enough to code the number of combinations…

Have a good day!
Lucas

Edit : error

ldeschamps · October 20, 2020, 12:59pm

Thanks!

I have read the approach, and I find it great! I do not know how to translate this algorithmic approach into probabilistic statement, though. I guess it could be possible to approximate it in a stan program?

I could be a great idea to dive deeper inside, and the result should be simpler that the one described in this link, because there is no team, only individual results.

Have a good day!
Lucas

isms · October 26, 2020, 7:12pm

Hi Lucas,

A couple other possibilities would be TrueSkill (discussion), TrueSkill2, the Stochastic Rank Ordered Logit model, or Plackett-Luce (overview, Stan implementation, 2012 paper on Bayesian version).

One way to avoid ragged arrays is to have data in tidy format like this:

game_id,game_kind,player_id,win
1,2,1,1
1,2,2,0
1,2,3,0
1,2,4,0
2,1,1,0
2,1,2,0
2,1,3,1

In order to keep things vectorized you can also book-keep which row a game starts on and how many players there are, like:

data {
    // constants
    int n_rows;
    int n_players;
    int n_games;
    int n_game_kinds;

    // tidy data
    int player_won[n_rows];    // outcome
    int game_id[n_rows];       // which game is this row about
    int player_id[n_rows];     // which player is this row about

    // metadata
    int<lower=1, upper=n_game_kinds> game_kind[n_games]; // e.g. chess, checkers, ...

    int<lower=1, upper=n_rows> game_start[n_games];      // which row does game j start in?
    int<lower=2> n_players_in_game_j[n_games];           // how many rows does game j comprise?
    ...
}
parameters {
    vector[n_players] latent_abilities;    // whatever you're trying to model
    ...
}
model {
    ...
    latent_abilities ~ std_normal();   // prior of some kind
    for (j in 1..n_games) {
        int start = game_start[j];
        int end = start + n_players_in_game_j[j];
        int players_j[n_players_in_game_j] = player_id[start:end];
        int results_j[n_players_in_game_j] = player_won[start:end];
        int game_kind_j = game_kind[game_id[j]];
        vector[n_players_in_game_j] abilities_j = latent_abilities[players_j];
        ...
        results_j ~ your_likelihood_here(abilities_j, game_kind_j, ...);
    }
}

That’s just a sketch, haven’t checked for syntax or off by one errors, but you can generalize from there, so instead of won in {0, 1} you might have rank in [1, 2, …] (e.g. in a horse race) or score (e.g. in a basketball game), or something more custom like # of seconds behind 1st place finisher (e.g. in a footrace).

ldeschamps · October 26, 2020, 8:33pm

Hi!

Wahou!!

These are excellent suggestions! I will take a read at the links you provided, thank you very much :)

Lucas

isms · October 27, 2020, 1:19pm

One more note: that data block is written to show what’s going on, it’s actually not very efficient or terse. You could write it like this using the segment() function and without all the intermediate variables:

...
    for (j in 1..n_games) {
        ...
        segment(player_won, game_start[j], n_players_in_game_j[j]) ~ your_likelihood_here(
            latent_abilities[segment(player_id, game_start[j], n_players_in_game_j[j])],
            game_kind[game_id[j]],
            ...
        );
    }

Bob_Carpenter · December 10, 2024, 9:49pm

~~Not sure why this one popped up again in my feed, but I didn’t see it the first time.~~ I figured it out—it’s the eight spam posts in the background. This one just really attract spam because it mentions games!

Given the model you outlined, you want something called a “discrete choice” model. This is just a multi-logit regression among the players given their abilities on the log odds scale (like Bradley-Terry, or rescaled ELO).

If there is a ranking of players, then you probably want something like a Plackett-Luce model (it’s just an iterative Bradley-Terry model).

Aida · July 15, 2025, 7:34pm

When I worked on analyzing multi-person games with inconsistent rosters, I ended up keeping my data in a long format, with a simple indicator for who was present and won each game. It took a bit of wrangling to get it clean, but it made fitting models like Plackett-Luce or Bradley-Terry much more manageable. The bigger pain for me was tracking player-specific interactions, especially when I suspected some folks had higher chances only when certain others weren’t around.

Seeing board games and sports migrate to these digital platforms reminded me how even traditional stuff is moving online. For things like digital cockfighting, there’s iSabong Asia, which actually lets people engage and wager in real time, no matter where they are.

Topic		Replies	Views
Multinomial Logit: probability of choice of a soccer action Modeling pystan , multinomial-response , logistic-regression	24	1986	March 9, 2021
Can a multinomial regression be condensed? Modeling	7	329	March 31, 2024
Raykar et al. Learning from Crowds model Modeling	6	1273	April 8, 2020
Doubts when modelling the probability of a team beating another team Modeling specification	3	591	November 23, 2020
How to model multinomial data with uneven numbers of options between items brms rstan , specification	6	607	May 30, 2020

Modelling the winner of board games

Related topics