Calculating loglikelihood for permutations / race outcomes

HJAM24 · January 27, 2021, 11:37am

Hi all,

I am reading a paper about a hierarchical model that uses race data.

For each race, j, they observe the finishing position of driver i. From these finishing positions, they estimate the racing ability of every driver. They start with a model where the racing ability of driver i is constant across each race:

and

In order to create a finishing order of the drivers from the racing abilities, they begin with the last driver. They choose a driver to finish in last place with probability proportional to

Next, they choose the second-to-last driver from the remaining drivers again with probability proportional to lambda_i,j This process is repeated until drivers i,1 and i,2 remain. The probability that driver i,1 finished second is:

In the step before, there were three drivers left in the equation. In this case, the third place driver i,1 is chosen with probability:

The probability that driver i,2 becomes third is:

Basically, in the model drivers drop out one by one until the last two drivers are left. Intuitively, each driver remains in the race for an exponential length of time with rate (inverse mean) parameter, lambda_i,j. The longer a driver remains in the race, the better the racing ability of this driver.

First question: how exactly is the the last-place driver selected? We know that the probability that a variable is the minimum of a number of exponential random variables is:

In the paper the authors say the following:

“We choose a driver to finish in last place with probability proportional to ”

Let’s say there are 20 drivers. Why not compare the probabilities of each driver getting last as follows:

and see which driver has the highest likelihood to finish last. Its not clear to me how they classify one the 20 drivers as the last place driver

Next, let be defined so that means that driver i finishes in k-th place in race j. If we have J races and I drivers then we have the following likelihood function:

I am trying to reproduce this in Stan-code:

data {
int<lower=1> n_drivers;
int<lower=1> n_races;
int<lower=1, upper=n_drivers> driver_id[n_races];
matrix[n_drivers, n_races] race_results // matrix with the finishing positions of each driver for all the races
}

parameters {
real<lower=0> tau_race_skills;
vector[n_drivers - 1] raw_race_skills;
}

transformed parameters {
vector[n_drivers] race_skills;
vector[n_drivers] lambda_race_skills

for (driver in 1:(n_drivers - 1)) {
race_skills[driver] = raw_race_skills[driver];
}

race_skills[n_drivers] = -sum(raw_race_skills);
lambda_race_skills = exp(-race_skills);
}

model {
// priors
raw_race_skills ~ normal(0, tau_race_skills);

// likelihood

}

How do I write the loglikelihood function in my stan code? In addition, how do I make sure that it follows the method where each driver drops out one-by-one?

Help and/or tips are much appreciated!!

HJAM24 · January 31, 2021, 8:56am

data { 
int<lower=1> n_drivers; 
int<lower=1> n_races;
int<lower=1, upper=n_drivers> driver_id[n_races];
matrix[n_races, n_drivers] race_results
}

parameters {
real<lower=0> tau_race_skills;
vector[n_drivers - 1] raw_race_skills;
}

transformed parameters {
vector[n_drivers] race_skills;
vector[n_drivers] lambda_race_skills

for (driver in 1:(n_drivers - 1)) {
    race_skills[driver] = raw_race_skills[driver];
}

race_skills[n_drivers] = -sum(raw_race_skills);    
lambda_race_skills = exp(-race_skills);
}

model {  
// priors 
raw_race_skills ~ normal(0, tau_race_skills);

// likelihood
for (race in 1:n_races) {
    for (driver in 1:n_drivers) {
        // determine kth place
        finish_position = n_drivers - driver + 1
        // find driver_id of driver i that finished kth place in race j 
        driver = race_results[race, finish_position]
        // find driver_id of drivers that didn't drop out yet in race j
        other_drivers = race_results[race, 1:finish_position]
        lambda_race_skills_other_drivers = 0
        for (other_driver in 1:other_drivers) {
            lambda_race_skills_other_drivers += lambda_race_skills[other_driver]
        }
        // calculate prob that driver i finished kth place in race j 
        target += lambda_race_skills[driver] / lambda_race_skills_other_drivers
    }  
}

Is the above Stan-interpretation of the paper methodology correct? Please advice! Help is much appreciated!

bbbales2 · February 1, 2021, 6:52pm

Any time you have a custom probability you want to use, you can use the target += syntax to add the log probability directly to the log density that Stan samples.

So this shows how you can do that for a normal distribution: 7.3 Increment log density | Stan Reference Manual

For a bernoulli with probability p, you could replace:

y ~ bernoulli(p); // assume y is integer 0 or 1

with:

if(y == 0) {
  target += log(1 - p);
} else {
  target += log(p);
}

(though the second will be slower and isn’t vectorized)

Here is a paper that might be related to the problem you are looking at: http://www.glicko.net/research/multicompetitor.pdf .

I think this video might be related to the problem you’re working with as well: Statistical Rethinking Winter 2019 Lecture 14 - YouTube

Topic		Replies	Views
Wanted: Consulting Opportunity Implementing Sports Related Research Paper Jobs	3	344	May 26, 2024
Stan coding difficulty in lognormal race model Modeling techniques , specification , cognitive-science	6	565	January 22, 2021
Workaround for likelihood involving many mixtures Modeling sports , discrete-parameters	2	613	August 25, 2021
How to write a Stan function retuning permutations of a vector? Modeling	2	665	April 4, 2019
Custom likelihood function based on Wiener first passage time density accounting for NoGo responses Modeling cognitive-science	6	980	April 22, 2021

Calculating loglikelihood for permutations / race outcomes

Related topics