# Help with Bayesian Modelling

I am currently working on my postgraduate dissertation in statistics and I am using STAN extensively for this. My dissertation involves simulating a latent position model of chess games which estimates the ratings of the respective chess players. So, the model would be an alternative to the existing Elo Rating system.

My raw data consists of all the matches played during January 2024 on Lichess among all titled, non-BOT players. The only input I am using from this data are the names of each unique player and the outcome for each match.

With regards the STAN model, I have based the priors of both latent variables (ratings and white adjustment factor) on their respective distribution from the raw data.

Currently, my STAN model is as follows:
data {
int<lower=0> P; // number of players
int<lower=0> N_games_matrix[P, P]; // number of games matrix
int<lower=0> Y_matrix[P, P]; // scores matrix
}

parameters {
vector<lower=2004.86, upper=3200>[P-1] gamma_free;
vector<lower=0.015, upper=0.075>[P] W;
}

transformed parameters {
vector[P] gamma; // latent ratings for each player
gamma[1] = 0; // constrain the first playerâ€™s gamma to 0

for (p in 2:P) {
gamma[p] = gamma_free[p-1]; // assign the rest of the gammas
}
}

model {
// Priors
gamma_free ~ normal(2579.787, 191.9756); // prior for the free latent ratings
W ~ normal(0.045, 0.5); // prior for the white player advantage

// Likelihood
for (i in 1:P) {
for (j in 1:P) {
if (i != j) { // Ensure i is not equal to j
real eta = gamma[i] - gamma[j] + W[i];

``````    // Debugging: Print intermediate values and target log probability
print("i: ", i, " j: ", j, " eta: ", eta);
print("Y_matrix[i, j]: ", Y_matrix[i, j], " N_games_matrix[i, j]: ", N_games_matrix[i, j]);
print("target(): ", target());

Y_matrix[i, j] ~ binomial(2 * N_games_matrix[i, j], inv_logit(eta));

// Debugging: Print updated target log probability
print("Updated target(): ", target());
}
}
``````

}
}

However, I am consistently receiving errors like this:
â€śChain 2: Log probability evaluates to log(0), i.e. negative infinity.
Chain 2: Stan canâ€™t start sampling from this initial value.
Chain 2:
Chain 2: Initialization between (-2, 2) failed after 1 attempts.â€ť

In addition, when I included print statements to see where exactly the intialization fails, I get the following for each estimate value:
Y_matrix[i, j]: 0 N_games_matrix[i, j]: 0
target(): -inf
Updated target(): -inf
i: 948 j: 714 eta: -7.06877
Y_matrix[i, j]: 0 N_games_matrix[i, j]: 0
target(): -inf

Since N_games_matrix[i, j] and Y_matrix[i, j] are zero, I am pretty sure the binomial distribution might be causing issues. Can you offer any advice on this matter?

Any help is greatly appreciated and all the best :)

Thanks,
Patrick

1 Like

As an aside,

may be confusing as the names indicate a matrix, but these are two dimensional arrays. Modern Stan syntax would have them like so:

`array[P, P] int<lower=0> variable_name;`

If you instead want a matrix, you would specify it like so:

`matrix<lower=0>[P, P] variable_name;`

Since in the model block youâ€™re printing out these data variables and they are all zero, I assume this isnâ€™t the data you are intending to pass into Stan, so youâ€™ll need to check that you are actually passing Stan the proper structure.

Once you get that sorted â€¦

Iâ€™m not familiar with what you are trying to achieve here, but your parameter constraints seem suspect to me. For gamma_free, you are saying it is impossible that the parameter falls below exactly 2004.86 or above 3200, for example, is this what you intend? The range of your prior says otherwise.

1 Like

The input matrices, N_games_matrix and Y_matrix, are not all zero though. The N_games_matrix is the number of games played by each pair of players, when a certain player is White. The Y_matrix is the cumulative score by each pair of players, when a certain player is White. Both matrices are sparse but there are some non-zero values in both matrices.

Secondly, the range of the gamma parameter should be Normally distributed and be between 2004.86 and 3200. However, I am unsure what prior to use to achieve this.

Regarding the zeros, I was responding to what you said here as you havenâ€™t shown the data and such:

Hi Scott,

I understand what you mean now.

I attach the CSV file of my raw data. I am only using the fields â€śWhiteâ€ť, â€śBlackâ€ť and :Result" with my STAN model. The Elo ratings will be averaged per player and compared against the posterior ratings from my model.

In addition, I construct the 2 inputted matrices using the attached R script.

Once the data is prepared, I compile the STAN model with the prepared data using the attached MCMC R file script.

MCMC.R (3.48 KB)

If you are seeing all zeros as you said, check in `R` that your two matrices:

``````data_list <- list(
P = P,                              # Total number of unique players
N_games_matrix = Number_of_games_matrix,  # Matrix of number of games
Y_matrix = Score_per_matrix,             # Matrix of scores
unique_players = all_players              # Unique player names
)
``````

do not contain all zeros, actually have the data you expect. If they are large, Maybe make the matrix smaller so you can easily review it. And when you pass them into Stan you see that data.
For example, below shows that Stan gets the same data I gave it regardless of it being an array or matrix.

test.stan:

``````data {
int<lower=0> P;
array[P,P] real X;
matrix[P,P] Y;
}

generated quantities {
print("X =", X);
print("Y =", Y);
}
``````

In R,

`m\$sample(data = list(P = 2, X = diag(1:2), Y = diag(3:4)), fixed_param = TRUE, iter_warmup = 0, iter_sampling = 2)`

Returns

``````Chain 1 X =[[1,0],[0,2]]
Chain 1 Y =[[3,0],[0,4]]
``````

And that you are incrementing `target` on all non-diagonal values (some or all or which have zeros),