Workaround for likelihood involving many mixtures

noctilux · August 17, 2021, 4:24am

Hi all,

I’m currently doing some modelling of Formula 1 race results with Stan and I got a little stuck, so I’d appreciate some advice!

I coded up the following likelihood, which I believe to be Luce’s extension of the Bradley-Terry model to multiple objects (see http://mayagupta.org/publications/PairedComparisonTutorialTsukidaGupta.pdf, linked elsewhere):

   real compute_likelihood(vector cur_skills, int n_per_race) {
    
        real cur_lik = 0;

        for (cur_position in 1:(n_per_race - 1)) {
            vector[n_per_race - cur_position + 1] other_skills;

            for (cur_other_position in cur_position:n_per_race) {
                other_skills[cur_other_position - cur_position + 1] = cur_skills[cur_other_position];
            }

            real cur_numerator = cur_skills[cur_position];
            real cur_denominator = log_sum_exp(other_skills);
            cur_lik += cur_numerator - cur_denominator;
        }
        
        return cur_lik;

    }

This is all working fine, I think. Here’s my problem though. I would love to model each skill as a mixture: either the race goes fine for a particular driver, or they have a problem (they might crash, or have an engine issue, etc.). If everything goes fine, their skill is unchanged. But if they have a problem, I’d like to subtract something, say a Gamma random variable with some reasonable parameters, from their skill. Whether or not there’s a problem is not observed (at least not in my current dataset), so I’d like to treat it as a latent variable.

Usually I’d just marginalise out the latents. But because there are up to 20 drivers, and the likelihood function involves all their skills, I believe I would have to sum over all 2^{20} combinations of problem/not problem for each driver, which is too many.

I hope the problem is clear. My current thought is that I might experiment with a random effect with a sparse prior instead, which might have a similar effect (i.e. zero most of the time, but sometimes a deviation). Ideally I’d like to constrain its sign to be negative, and I’m not sure what the best way would be for that. In any case, I’d be grateful for any suggestions!

martinmodrak · August 25, 2021, 2:41pm

Hi,
I admit I have no idea how the Bradley-Terry model works (first time I hear about it :-) ), but since nobody else answered, I will give it a try.

Yes, that sounds correct. However, I would expect the probability that any single driver has a problem to be relatively low, so the probability that say more than K drivers have a problem would be negligible (right? I don’t follow Formula racing :-) for some small K, say K = 5. So you may then need to average only over \sum_{k=0}^K {20 \choose k}, for K = 5 this is 21700 which is a lot, but not completely crazy?

That sounds reasonable. Although sparse priors have a lot of challenges and may not really work well. However, I think the assumption could instead be that skills of drivers do vary somewhat between races + you need to allow for some big outliers due to problems, so a random effect with a heavy-tailed distribution like Student-T might also be sensible.

Best of luck with your model!

spinkney · August 25, 2021, 3:00pm

I don’t know Bradley-Terry models nor am I an expert on what I’m about to suggest :D. You could formulate the problem as a longitudinal HMM with multiple series. Apparently @martinmodrak may have worked on a model like this (from Examples for hidden Markov models with longitudinal data).

The forward algorithm would cut down on the number of combinations and you only have 2 states (ok and problem). If they’re in the “ok” state skill states the same, if they are in the “problem” state you decrease their skill. You could even increase the probability of transitioning into and staying in the “problem” state as a function of skill.

Topic		Replies	Views
Stan coding difficulty in lognormal race model Modeling techniques , specification , cognitive-science	6	505	January 22, 2021
Summing out discrete paramters in a mixture. How can I derive that it works for individual observations? Modeling specification	9	679	June 2, 2020
Hierarchical mixture models in Stan Modeling	1	628	June 30, 2022
Mixture of Multivariate Bernouilli Modeling	5	890	September 29, 2019
Multi-modal distribution for the random effect (infinite mixture model with a Dirichlet process prior)- Multilevel model brms rstan	1	679	March 29, 2020

Workaround for likelihood involving many mixtures

Related topics