Hi all,
I’m currently doing some modelling of Formula 1 race results with Stan and I got a little stuck, so I’d appreciate some advice!
I coded up the following likelihood, which I believe to be Luce’s extension of the Bradley-Terry model to multiple objects (see http://mayagupta.org/publications/PairedComparisonTutorialTsukidaGupta.pdf, linked elsewhere):
real compute_likelihood(vector cur_skills, int n_per_race) {
real cur_lik = 0;
for (cur_position in 1:(n_per_race - 1)) {
vector[n_per_race - cur_position + 1] other_skills;
for (cur_other_position in cur_position:n_per_race) {
other_skills[cur_other_position - cur_position + 1] = cur_skills[cur_other_position];
}
real cur_numerator = cur_skills[cur_position];
real cur_denominator = log_sum_exp(other_skills);
cur_lik += cur_numerator - cur_denominator;
}
return cur_lik;
}
This is all working fine, I think. Here’s my problem though. I would love to model each skill as a mixture: either the race goes fine for a particular driver, or they have a problem (they might crash, or have an engine issue, etc.). If everything goes fine, their skill is unchanged. But if they have a problem, I’d like to subtract something, say a Gamma random variable with some reasonable parameters, from their skill. Whether or not there’s a problem is not observed (at least not in my current dataset), so I’d like to treat it as a latent variable.
Usually I’d just marginalise out the latents. But because there are up to 20 drivers, and the likelihood function involves all their skills, I believe I would have to sum over all 2^{20} combinations of problem/not problem for each driver, which is too many.
I hope the problem is clear. My current thought is that I might experiment with a random effect with a sparse prior instead, which might have a similar effect (i.e. zero most of the time, but sometimes a deviation). Ideally I’d like to constrain its sign to be negative, and I’m not sure what the best way would be for that. In any case, I’d be grateful for any suggestions!