Bradley-Terry tennis model: constant covariance works, but time-dependent covariance doesn't. What's going wrong?

Jake_Singleton · April 13, 2023, 6:22pm

I am making a Bradley-Terry model to infer tennis player skills. I assume that each player has a skill vector that is trivariate-normal with mean 0 and covariance matrix that depends on the amount of time since they last played a match. I use a trivariate normal since there are 3 court surfaces in tennis, hard court, clay court, and grass court. My hypothesis with the covariance is that if a player hasn’t played in a while, there is more uncertainty in his skill. This is similar to the idea behind Glicko.

I am using a negative binomial likelihood on the set counts, since in tennis you must win either 2 or 3 sets to win the match depending on whether it is a best of 3 or 5 set match.

The model is:

functions {
  // Computes the log-PDF of the negative binomial distribution with y failures until r successes with success probability p
  // vetorized
  real negbin_vec_lpmf(int[] y, vector p, int[] r) {
    int n = num_elements(y);
    vector[n] y_ = to_vector(y);
    vector[n] r_ = to_vector(r);
    return sum(lchoose(y_ + r_ - 1, y_) + r_.*log(p) + y_.*log1m(p));
  }
  // element-by-element
  real negbin_lpmf(int y, real p, int r) {
    real lp = lchoose(y + r - 1, y) + r*log(p) + y*log1m(p);
    return lp;
  }
}
data {
  int<lower=0> J;         // number of players
  int<lower=0> N_train;  // number of matches
  matrix[N_train, J] z;  // indicator matrix where z[i,j] = 1 if player j participate in match i
  matrix[N_train, 3*J] X_skill;  // design matrix for player skills
  matrix[N_train, J] X_skill_time;  // number of months between each player's last event and current event 
  int total_n[N_train];  // whether the match was best of 3 or 5 sets
  int winner_y[N_train];  // number of sets LOST by the winner of the match
}
transformed data {
  vector[3] Zero = rep_vector(0, 3);
  // Build up requisite number of set victories, which is 2 for a best of 3 match or 3 for a best of 5 match
  int<lower=2, upper=3> r[N_train];
  for (i in 1:N_train) {
    if (total_n[i] == 3) {
      r[i] = 2;
    } else {
      r[i] = 3;
    }
  }
}
parameters {
  cholesky_factor_corr[3] Lcorr0;
  row_vector[3] alpha[J];  // player skills over the rating periods
  real<lower=0> beta;  // how uncertainty grows as a player misses time
  vector<lower=0>[3] tau;
}
transformed parameters {
  // define 3x3 correlation and covariance matrices
  cov_matrix[3] Omega;
  corr_matrix[3] O;
  O = multiply_lower_tri_self_transpose(Lcorr0);
  Omega = quad_form_diag(O, tau);
}
model {
  // Priors
  Lcorr0 ~ lkj_corr_cholesky(1);
  tau ~ cauchy(0, 25);
  beta ~ normal(0, 25);
  // Build up covariance matrices
  for (i in 2:(N_train+1)) {
    for (j in 1:J) {
      if (z[i-1, j] == 1 && X_skill_time[i-1, j] > 0) { // Time-varying covariance for the two players participating in match i-1
        alpha[j] ~ multi_normal(Zero, quad_form_diag(O, tau + beta^2*X_skill_time[i-1, j]));
      } else { // Otherwise use constant covariance
        alpha[j] ~ multi_normal(Zero, Omega);
      }
    }
  }
  // Likelihood
  // Build up set win probability vector for each match
  vector[N_train] s;
  s = inv_logit(X_skill * to_vector(to_matrix(alpha)));
  target += negbin_vec_lpmf(winner_y | s, r);
}

When I try to fit the model, all transitions diverge, Rhat is bad, ESS is bad, basically something is going wrong. I’m not quite sure what to do. Note that if, for instance, I assume the covariance is constant and the same for all players and specify alpha ~ normal(Zero, Omega);, the model fits fast, the estimates are reasonable, diagnostics are good, and everything is fine. But I’m not sure what’s going wrong with my “time-dependent” covariance here.

I’ve attached csv’s of z, X_skill, X_skill_time,total_n, and winner_y so you can run the model yourself. For these data, J = 15 and N_train = 114.

Thanks for any help!

total_n.csv (232 Bytes)
winner_y.csv (232 Bytes)
X_skill_time.csv (3.5 KB)
X_skill.csv (11.2 KB)
z.csv (3.8 KB)

edm · April 18, 2023, 2:40am

I might be off, but it seems odd that you repeatedly assign a multi_normal distribution to alpha[j] inside of the loop over i. It seems like you would want to assign a distribution to each alpha[j] only once (bringing it outside of the loop over i), or declare a new alpha for each combination of i and j.

I also was initially confused because I was reading the letter O as a zero, but I think that part is ok.

Jake_Singleton · April 20, 2023, 12:15am

Thanks for your feedback, and you’re right – the loops over the alpha[j] were causing issues. I’ve since improved the model by introducing a volatility parameter, where I sample a normally distributed volatility for each of the two players in a given match. The scale of the volatility is a function of the player’s time off and their baseline erraticity, which represents the variance of each player’s match-by-match performance. The model is as follows.

functions {
  // Computes the log-PDF of the negative binomial distribution with y failures until r successes with success probability p
  // vetorized
  real negbin_vec_lpmf(int[] y, vector p, int[] r) {
    int n = num_elements(y);
    vector[n] y_ = to_vector(y);
    vector[n] r_ = to_vector(r);
    return sum(lchoose(y_ + r_ - 1, y_) + r_.*log(p) + y_.*log1m(p));
  }
  // element-by-element
  real negbin_lpmf(int y, real p, int r) {
    real lp = lchoose(y + r - 1, y) + r*log(p) + y*log1m(p);
    return lp;
  }
}
data {
  int<lower=0> J;         // number of players
  int<lower=0> N_train;  // number of matches
  matrix[N_train, J] z1;  // indicator matrix where z[i, j] = 1 if player j is player 1 in match i
  matrix[N_train, J] z2;  // indicator matrix where z[i, j] = 1 if player j is player 2 in match i
  matrix[N_train, 3*J] X_skill;  // design matrix for player skills
  vector[N_train] X1;  // months passed since player 1 last played before match i 
  vector[N_train] X2;  // months passed since player 2 last played before match i 
  int total_n[N_train];  // whether the match was best of 3 or 5 sets
  int winner_y[N_train];  // number of sets LOST by the winner of the match
}
transformed data {
  vector[3] Zero = rep_vector(0, 3);
  // Build up requisite number of set victories, which is 2 for a best of 3 match or 3 for a best of 5 match
  int<lower=2, upper=3> r[N_train];
  for (i in 1:N_train) {
    if (total_n[i] == 3) {
      r[i] = 2;
    } else {
      r[i] = 3;
    }
  }
}
parameters {
  cholesky_factor_corr[3] Lcorr0;
  row_vector[3] alpha[J];  // player skills over the rating periods
  real beta;  // how uncertainty grows as a player misses time
  vector<lower=0>[3] tau;
  vector[J] erratic;  // player's baseline erraticity (same across all court types)
  matrix[N_train, 2] volatility;  // time dependent volatility of the two competing players that accounts for their baseline erraticity and "time off" for each match
}
transformed parameters {
  // define 3x3 correlation and covariance matrices
  cov_matrix[3] Omega;
  corr_matrix[3] O;
  O = multiply_lower_tri_self_transpose(Lcorr0);
  Omega = quad_form_diag(O, tau);
}
model {
  // Priors
  Lcorr0 ~ lkj_corr_cholesky(1);
  tau ~ cauchy(0, 25);
  beta ~ normal(0, 1);
  alpha ~ multi_normal(Zero, Omega);
  erratic ~ normal(0, 1);
  // Likelihood
  // Build up volatilities for each match
  volatility[, 1] ~ normal(0, exp(beta .* X1 + z1 * erratic));
  volatility[, 2] ~ normal(0, exp(beta .* X2 + z2 * erratic));
  // Build up set win probability vector for each match
  vector[N_train] s;
  s = inv_logit(X_skill * to_vector(to_matrix(alpha)) + volatility[, 1] - volatility[, 2]);
  target += negbin_vec_lpmf(winner_y | s, r);
}

Note that I don’t really care about the volatility parameter in and of itself. What I really care about is each player’s erratic tendencies and how much missing time might increase uncertainty in a match, which beta describes.

Now I’m trying to fit the model on a bigger data set (which I think might be important since I’m estimating more things, particularly a lot of scale parameter). I have N=1,745 and J=437, and unfortunately I’m seeing about 20-25% of transitions diverge. The alpha diagnostics seem relatively okay, but the erratic ones are not. Any ideas on how to solve this? The new data to fit the model is attached.

X_skill.csv (4.4 MB)
X1.csv (3.4 KB)
X2.csv (3.5 KB)
z1.csv (1.5 MB)
z2.csv (1.5 MB)

Jake_Singleton · April 20, 2023, 12:16am

The last two pieces of data are winner_y and total_n, which I couldn’t attach due to the 5 file limit…
total_n.csv (3.4 KB)
winner_y.csv (3.4 KB)

edm · April 20, 2023, 2:18am

It looks like a parameter identification problem, where you can add any constant to volatility and get the same predictions (because the first column is subtracted from the second). Maybe define the volatility difference as a parameter, where the sd of the difference is the sum of the two sds that you defined for each column of volatility.

Topic		Replies	Views
Bradley-Terry model: Stan vs. iterative conjugate fit Modeling	1	501	October 11, 2022
Add discrimination factor to Bradley-Terry model Modeling	1	302	March 23, 2023
How to generate Multivariate Laplace from Multivariate Normal? Modeling	1	850	December 27, 2022
Determining accuracy of darts player using xy coordinates of each throw and multivaratie distribution Modeling	1	417	February 11, 2021
Bradley-Terry model for teams Modeling	1	90	July 11, 2024

Bradley-Terry tennis model: constant covariance works, but time-dependent covariance doesn't. What's going wrong?

Related topics