I am currently working on my postgraduate dissertation in statistics and I am using STAN extensively for this. My dissertation involves simulating a latent position model of chess games which estimates the ratings of the respective chess players. So, the model would be an alternative to the existing Elo Rating system.
My raw data consists of all the matches played during January 2024 on Lichess among all titled, non-BOT players. The only input I am using from this data are the names of each unique player and the outcome for each match.
With regards the STAN model, I have based the priors of both latent variables (ratings and white adjustment factor) on their respective distribution from the raw data.
Currently, my STAN model is as follows:
data {
int<lower=0> P; // number of players
int<lower=0> N_games_matrix[P, P]; // number of games matrix
int<lower=0> Y_matrix[P, P]; // scores matrix
}
parameters {
vector<lower=2004.86, upper=3200>[P-1] gamma_free;
vector<lower=0.015, upper=0.075>[P] W;
}
transformed parameters {
vector[P] gamma; // latent ratings for each player
gamma[1] = 0; // constrain the first player’s gamma to 0
for (p in 2:P) {
gamma[p] = gamma_free[p-1]; // assign the rest of the gammas
}
}
model {
// Priors
gamma_free ~ normal(2579.787, 191.9756); // prior for the free latent ratings
W ~ normal(0.045, 0.5); // prior for the white player advantage
// Likelihood
for (i in 1:P) {
for (j in 1:P) {
if (i != j) { // Ensure i is not equal to j
real eta = gamma[i] - gamma[j] + W[i];
// Debugging: Print intermediate values and target log probability
print("i: ", i, " j: ", j, " eta: ", eta);
print("Y_matrix[i, j]: ", Y_matrix[i, j], " N_games_matrix[i, j]: ", N_games_matrix[i, j]);
print("target(): ", target());
Y_matrix[i, j] ~ binomial(2 * N_games_matrix[i, j], inv_logit(eta));
// Debugging: Print updated target log probability
print("Updated target(): ", target());
}
}
}
}
However, I am consistently receiving errors like this:
“Chain 2: Log probability evaluates to log(0), i.e. negative infinity.
Chain 2: Stan can’t start sampling from this initial value.
Chain 2:
Chain 2: Initialization between (-2, 2) failed after 1 attempts.”
In addition, when I included print statements to see where exactly the intialization fails, I get the following for each estimate value:
Y_matrix[i, j]: 0 N_games_matrix[i, j]: 0
target(): -inf
Updated target(): -inf
i: 948 j: 714 eta: -7.06877
Y_matrix[i, j]: 0 N_games_matrix[i, j]: 0
target(): -inf
Since N_games_matrix[i, j] and Y_matrix[i, j] are zero, I am pretty sure the binomial distribution might be causing issues. Can you offer any advice on this matter?
Any help is greatly appreciated and all the best :)
Thanks,
Patrick