# Bradley-Terry tennis model: constant covariance works, but time-dependent covariance doesn't. What's going wrong?

I am making a Bradley-Terry model to infer tennis player skills. I assume that each player has a skill vector that is trivariate-normal with mean 0 and covariance matrix that depends on the amount of time since they last played a match. I use a trivariate normal since there are 3 court surfaces in tennis, hard court, clay court, and grass court. My hypothesis with the covariance is that if a player hasn’t played in a while, there is more uncertainty in his skill. This is similar to the idea behind Glicko.

I am using a negative binomial likelihood on the set counts, since in tennis you must win either 2 or 3 sets to win the match depending on whether it is a best of 3 or 5 set match.

The model is:

``````functions {
// Computes the log-PDF of the negative binomial distribution with y failures until r successes with success probability p
// vetorized
real negbin_vec_lpmf(int[] y, vector p, int[] r) {
int n = num_elements(y);
vector[n] y_ = to_vector(y);
vector[n] r_ = to_vector(r);
return sum(lchoose(y_ + r_ - 1, y_) + r_.*log(p) + y_.*log1m(p));
}
// element-by-element
real negbin_lpmf(int y, real p, int r) {
real lp = lchoose(y + r - 1, y) + r*log(p) + y*log1m(p);
return lp;
}
}
data {
int<lower=0> J;         // number of players
int<lower=0> N_train;  // number of matches
matrix[N_train, J] z;  // indicator matrix where z[i,j] = 1 if player j participate in match i
matrix[N_train, 3*J] X_skill;  // design matrix for player skills
matrix[N_train, J] X_skill_time;  // number of months between each player's last event and current event
int total_n[N_train];  // whether the match was best of 3 or 5 sets
int winner_y[N_train];  // number of sets LOST by the winner of the match
}
transformed data {
vector Zero = rep_vector(0, 3);
// Build up requisite number of set victories, which is 2 for a best of 3 match or 3 for a best of 5 match
int<lower=2, upper=3> r[N_train];
for (i in 1:N_train) {
if (total_n[i] == 3) {
r[i] = 2;
} else {
r[i] = 3;
}
}
}
parameters {
cholesky_factor_corr Lcorr0;
row_vector alpha[J];  // player skills over the rating periods
real<lower=0> beta;  // how uncertainty grows as a player misses time
vector<lower=0> tau;
}
transformed parameters {
// define 3x3 correlation and covariance matrices
cov_matrix Omega;
corr_matrix O;
O = multiply_lower_tri_self_transpose(Lcorr0);
}
model {
// Priors
Lcorr0 ~ lkj_corr_cholesky(1);
tau ~ cauchy(0, 25);
beta ~ normal(0, 25);
// Build up covariance matrices
for (i in 2:(N_train+1)) {
for (j in 1:J) {
if (z[i-1, j] == 1 && X_skill_time[i-1, j] > 0) { // Time-varying covariance for the two players participating in match i-1
alpha[j] ~ multi_normal(Zero, quad_form_diag(O, tau + beta^2*X_skill_time[i-1, j]));
} else { // Otherwise use constant covariance
alpha[j] ~ multi_normal(Zero, Omega);
}
}
}
// Likelihood
// Build up set win probability vector for each match
vector[N_train] s;
s = inv_logit(X_skill * to_vector(to_matrix(alpha)));
target += negbin_vec_lpmf(winner_y | s, r);
}
``````

When I try to fit the model, all transitions diverge, Rhat is bad, ESS is bad, basically something is going wrong. I’m not quite sure what to do. Note that if, for instance, I assume the covariance is constant and the same for all players and specify `alpha ~ normal(Zero, Omega);`, the model fits fast, the estimates are reasonable, diagnostics are good, and everything is fine. But I’m not sure what’s going wrong with my “time-dependent” covariance here.

I’ve attached csv’s of `z`, `X_skill`, `X_skill_time`,`total_n`, and `winner_y` so you can run the model yourself. For these data, `J = 15` and `N_train = 114`.

Thanks for any help!

total_n.csv (232 Bytes)
winner_y.csv (232 Bytes)
X_skill_time.csv (3.5 KB)
X_skill.csv (11.2 KB)
z.csv (3.8 KB)

1 Like

I might be off, but it seems odd that you repeatedly assign a multi_normal distribution to `alpha[j]` inside of the loop over `i`. It seems like you would want to assign a distribution to each `alpha[j]` only once (bringing it outside of the loop over `i`), or declare a new alpha for each combination of `i` and `j`.

I also was initially confused because I was reading the letter O as a zero, but I think that part is ok.

Thanks for your feedback, and you’re right – the loops over the `alpha[j]` were causing issues. I’ve since improved the model by introducing a `volatility` parameter, where I sample a normally distributed `volatility` for each of the two players in a given match. The scale of the `volatility` is a function of the player’s time off and their baseline `erraticity`, which represents the variance of each player’s match-by-match performance. The model is as follows.

``````functions {
// Computes the log-PDF of the negative binomial distribution with y failures until r successes with success probability p
// vetorized
real negbin_vec_lpmf(int[] y, vector p, int[] r) {
int n = num_elements(y);
vector[n] y_ = to_vector(y);
vector[n] r_ = to_vector(r);
return sum(lchoose(y_ + r_ - 1, y_) + r_.*log(p) + y_.*log1m(p));
}
// element-by-element
real negbin_lpmf(int y, real p, int r) {
real lp = lchoose(y + r - 1, y) + r*log(p) + y*log1m(p);
return lp;
}
}
data {
int<lower=0> J;         // number of players
int<lower=0> N_train;  // number of matches
matrix[N_train, J] z1;  // indicator matrix where z[i, j] = 1 if player j is player 1 in match i
matrix[N_train, J] z2;  // indicator matrix where z[i, j] = 1 if player j is player 2 in match i
matrix[N_train, 3*J] X_skill;  // design matrix for player skills
vector[N_train] X1;  // months passed since player 1 last played before match i
vector[N_train] X2;  // months passed since player 2 last played before match i
int total_n[N_train];  // whether the match was best of 3 or 5 sets
int winner_y[N_train];  // number of sets LOST by the winner of the match
}
transformed data {
vector Zero = rep_vector(0, 3);
// Build up requisite number of set victories, which is 2 for a best of 3 match or 3 for a best of 5 match
int<lower=2, upper=3> r[N_train];
for (i in 1:N_train) {
if (total_n[i] == 3) {
r[i] = 2;
} else {
r[i] = 3;
}
}
}
parameters {
cholesky_factor_corr Lcorr0;
row_vector alpha[J];  // player skills over the rating periods
real beta;  // how uncertainty grows as a player misses time
vector<lower=0> tau;
vector[J] erratic;  // player's baseline erraticity (same across all court types)
matrix[N_train, 2] volatility;  // time dependent volatility of the two competing players that accounts for their baseline erraticity and "time off" for each match
}
transformed parameters {
// define 3x3 correlation and covariance matrices
cov_matrix Omega;
corr_matrix O;
O = multiply_lower_tri_self_transpose(Lcorr0);
}
model {
// Priors
Lcorr0 ~ lkj_corr_cholesky(1);
tau ~ cauchy(0, 25);
beta ~ normal(0, 1);
alpha ~ multi_normal(Zero, Omega);
erratic ~ normal(0, 1);
// Likelihood
// Build up volatilities for each match
volatility[, 1] ~ normal(0, exp(beta .* X1 + z1 * erratic));
volatility[, 2] ~ normal(0, exp(beta .* X2 + z2 * erratic));
// Build up set win probability vector for each match
vector[N_train] s;
s = inv_logit(X_skill * to_vector(to_matrix(alpha)) + volatility[, 1] - volatility[, 2]);
target += negbin_vec_lpmf(winner_y | s, r);
}
``````

Note that I don’t really care about the `volatility` parameter in and of itself. What I really care about is each player’s `erratic` tendencies and how much missing time might increase uncertainty in a match, which `beta` describes.

Now I’m trying to fit the model on a bigger data set (which I think might be important since I’m estimating more things, particularly a lot of scale parameter). I have N=1,745 and J=437, and unfortunately I’m seeing about 20-25% of transitions diverge. The `alpha` diagnostics seem relatively okay, but the `erratic` ones are not. Any ideas on how to solve this? The new data to fit the model is attached.

X_skill.csv (4.4 MB)
X1.csv (3.4 KB)
X2.csv (3.5 KB)
z1.csv (1.5 MB)
z2.csv (1.5 MB)

The last two pieces of data are `winner_y` and `total_n`, which I couldn’t attach due to the 5 file limit…
total_n.csv (3.4 KB)
winner_y.csv (3.4 KB)

It looks like a parameter identification problem, where you can add any constant to `volatility` and get the same predictions (because the first column is subtracted from the second). Maybe define the volatility difference as a parameter, where the sd of the difference is the sum of the two sds that you defined for each column of volatility.