Hi all,
I am reading a paper about a hierarchical model that uses race data.
For each race, j, they observe the finishing position of driver i. From these finishing positions, they estimate the racing ability of every driver. They start with a model where the racing ability of driver i is constant across each race:
and
In order to create a finishing order of the drivers from the racing abilities, they begin with the last driver. They choose a driver to finish in last place with probability proportional to
Next, they choose the second-to-last driver from the remaining drivers again with probability proportional to lambda_i,j This process is repeated until drivers i,1 and i,2 remain. The probability that driver i,1 finished second is:
In the step before, there were three drivers left in the equation. In this case, the third place driver i,1 is chosen with probability:
The probability that driver i,2 becomes third is:
Basically, in the model drivers drop out one by one until the last two drivers are left. Intuitively, each driver remains in the race for an exponential length of time with rate (inverse mean) parameter, lambda_i,j. The longer a driver remains in the race, the better the racing ability of this driver.
First question: how exactly is the the last-place driver selected? We know that the probability that a variable is the minimum of a number of exponential random variables is:
In the paper the authors say the following:
“We choose a driver to finish in last place with probability proportional to ”
Let’s say there are 20 drivers. Why not compare the probabilities of each driver getting last as follows:
and see which driver has the highest likelihood to finish last. Its not clear to me how they classify one the 20 drivers as the last place driver
Next, let be defined so that
means that driver i finishes in k-th place in race j. If we have J races and I drivers then we have the following likelihood function:
I am trying to reproduce this in Stan-code:
data {
int<lower=1> n_drivers;
int<lower=1> n_races;
int<lower=1, upper=n_drivers> driver_id[n_races];
matrix[n_drivers, n_races] race_results // matrix with the finishing positions of each driver for all the races
}parameters {
real<lower=0> tau_race_skills;
vector[n_drivers - 1] raw_race_skills;
}transformed parameters {
vector[n_drivers] race_skills;
vector[n_drivers] lambda_race_skillsfor (driver in 1:(n_drivers - 1)) {
race_skills[driver] = raw_race_skills[driver];
}race_skills[n_drivers] = -sum(raw_race_skills);
lambda_race_skills = exp(-race_skills);
}model {
// priors
raw_race_skills ~ normal(0, tau_race_skills);// likelihood
}
How do I write the loglikelihood function in my stan code? In addition, how do I make sure that it follows the method where each driver drops out one-by-one?
Help and/or tips are much appreciated!!