I have a model in which I’m trying to use team strengths to get predict the outcome of multiple teams playing a particular game. My logistic model for estimating team scores appears to be fairly robust, but doesn’t take into account changes in team strength over time. Any advice/literature that has done this would be appreciated.
Each game consists of a number of rounds which can be won or lost by either team. I’m using a logistic regression model where:
round_won ~ bernoulli_logit(team_a_strength-team_b_strength)
I think I could use a binomial and do this per match as well, but I think they should be functionally equivalent? I also specifically define my team strengths so that the mean of all team strengths is always equal to zero to remove correlation from these variables.
I then have a vector of team scores and can produce match predictions from my posterior accounting for various win conditions. This has produced reasonably good results in the past, however, the use of all historical data to calculate the team strengths is not always helpful as it can lead to overly precise calculations of strengths when a lot of data has been gathered which may be unfair to teams who have improved recently. All the match data I have are stamped with unix timestamps, so it would be possible to allow team strengths to vary with time. I have a couple of ways I think I could do this.
First I could set up the model so that older data simply contribute less to the team strength calculation… e.g.:
team_strength ~ normal(current_team_strength, f(deltaT))
which is easy to set up but probably not representative of the real process, and so it’s not easy to imagine what our function of time should look like.
Alternatively, I could do this properly and set up some kind of spline model where team_strength varies as a function of time with autocorrelation. This is probably more representative of the actual underlying process and I think that weekly or fortnightly knot points make sense for a model like this. I’m less familiar with these types of models in stan and the worry I have is that there may not be enough data to make a model like this generate reasonable results, especially if the number of teams is large.
I’m sure this is something that has been tackled before but I can’t find any literature that specifically tries to do time variance with logistic regression like this. Any suggestions or pointers would be helpful to improve this model.