Trouble plotting latent positions in Latent Space Model

Patrick_O_Rourke · June 4, 2024, 1:46pm

The following is my Stan program I am using

data {
  int<lower=0> N; // number of games
  int<lower=1> P; // number of players
  int<lower=0> y[N]; // number of games won by player i
  int<lower=0> N_ij[N]; // total number of games played
  int<lower=1> i[N]; // player i index
  int<lower=1> j[N]; // player j index
  int<lower=0,upper=1> white[N]; // indicator if player i played white
}

parameters {
  vector[N] gamma_white; // player ratings for White players for each game
  vector[N] gamma_black; // player ratings for Black players for each game
  vector[N] W; // adjustment factor for White player advantage for each game
}

model {
  // Priors
  gamma_white ~ normal(0, 1);
  gamma_black ~ normal(0, 1);
  W ~ normal(0, 1);

  // Likelihood
  for (n in 1:N) {
    real eta;
    if (white[n] == 1) {
      eta = gamma_white[n] - gamma_black[n] + W[n];
    } else {
      eta = gamma_black[n] - gamma_white[n] + W[n];
    }
    y[n] ~ binomial(N_ij[n], inv_logit(eta));
  }
}

The R script using the stan script is as follows:

# Load necessary libraries
library(rstan)
library(bayesplot)

# Set the working directory
setwd("/Users/patrickorourke/Desktop/UCD/Dissertation.nosync")

# Load data_list
load("data_list.RData")

# Compile the Stan model
stan_model <- stan_model("chess_model.stan")

# Fit the model using MCMC
fit <- sampling(stan_model, data = data_list, iter = 2000, chains = 4)

# Convert the fit object to an mcmc.list object
library(coda)

# Extracting samples for each parameter
gamma_white_samples <- extract(fit, pars = "gamma_white")$gamma_white
gamma_black_samples <- extract(fit, pars = "gamma_black")$gamma_black
W_samples <- extract(fit, pars = "W")$W

# Combine all chains for each parameter into mcmc objects
gamma_white_mcmc <- as.mcmc(gamma_white_samples)
gamma_black_mcmc <- as.mcmc(gamma_black_samples)
W_mcmc <- as.mcmc(W_samples)

# Create an mcmc.list object
mcmc_list <- mcmc.list(gamma_white_mcmc, gamma_black_mcmc, W_mcmc)




# Calculate summary statistics for each parameter
summary_stats <- summary(fit)$summary
print(summary_stats)

# Autocorrelation plot for the first element of each parameter
mcmc_acf(as.array(fit), pars = c("gamma_white[1]", "gamma_black[1]", "W[1]"))

# Check for divergent transitions
stan_trace(fit, pars = c("gamma_white[1]", "gamma_black[1]", "W[1]"), inc_warmup = FALSE)

# Print warnings or errors
print(fit@stan_args[[1]]$adapt_term_buffer)
print(fit@stan_args[[1]]$adapt_init_buffer)
print(fit@stan_args[[1]]$adapt_window)

# Ensure required libraries are loaded
library(reshape2)
library(ggplot2)

# Convert gamma_white and gamma_black samples to data frames
gamma_white_df <- as.data.frame(gamma_white_samples)
gamma_black_df <- as.data.frame(gamma_black_samples)

# Add an Iteration identifier to the data frames
gamma_white_df$Iteration <- 1:nrow(gamma_white_df)
gamma_black_df$Iteration <- 1:nrow(gamma_black_df)

# Melt data frames for plotting
gamma_white_melt <- melt(gamma_white_df, id.vars = "Iteration", variable.name = "Player", value.name = "Gamma_White")
gamma_black_melt <- melt(gamma_black_df, id.vars = "Iteration", variable.name = "Player", value.name = "Gamma_Black")

# Combine melted data frames for plotting
combined_df <- merge(gamma_white_melt, gamma_black_melt, by = c("Iteration", "Player"))

# Scatter plot of latent positions
latent_positions_plot <- ggplot(combined_df, aes(x = Gamma_White, y = Gamma_Black)) +
  geom_point(alpha = 0.3) +
  labs(x = "Gamma White (Latent Position)", y = "Gamma Black (Latent Position)",
       title = "Scatter Plot of Latent Positions (Gamma White vs. Gamma Black)") +
  theme_minimal()

# Save the plot as a PNG file in the current working directory
ggsave("latent_positions_plot.png", plot = latent_positions_plot, width = 10, height = 8)

ggplot(gamma_white_melt, aes(x = Gamma_White)) +
       geom_histogram(bins = 50, alpha = 0.6, fill = "blue") +
      labs(title = "Distribution of Gamma_White", x = "Gamma_White", y = "Count") +
      theme_minimal()

ggplot(gamma_black_melt, aes(x = Gamma_Black)) +
      geom_histogram(bins = 50, alpha = 0.6, fill = "red") +
      labs(title = "Distribution of Gamma_Black", x = "Gamma_Black", y = "Count") +
       theme_minimal()

# Extract gamma_white samples
gamma_white_samples <- extract(fit, "gamma_white")$gamma_white

# Extract gamma_black samples
gamma_black_samples <- extract(fit, "gamma_black")$gamma_black

# Extract W samples
W_samples <- extract(fit, "W")$W

# Convert each set of samples to an mcmc object
mcmc_gamma_white <- mcmc(gamma_white_samples)
mcmc_gamma_black <- mcmc(gamma_black_samples)
mcmc_W <- mcmc(W_samples)

# Plot the trace plots for each variable
par(mfrow = c(3, 1))  # Set the layout to 3 rows and 1 column
traceplot(mcmc_gamma_white, main = "Trace Plot for Gamma White")
traceplot(mcmc_gamma_black, main = "Trace Plot for Gamma Black")
traceplot(mcmc_W, main = "Trace Plot for W")



#acfplot(fit_mcmc[, "gamma_white[1]"], main = "Autocorrelation for Gamma_White[1]")

The Latent Position Model model which will be examined in this dissertation is as follows:

\log\left(\frac{\Pr(y_{ij} \text{ games won by } i)}{\Pr((N_{ij} - y_{ij}) \text{ games won by } j)}\right) = \gamma_i - \gamma_j + W_i
for i, j \in \{1, \ldots, N\}.

Y_{ij} is a binomial random variable with parameters p_{ij}, the success probability of player i winning, and N_{ij}, the total number of games between player i and player j .

I have 3 latent variables but how can I create the latent positions for them? I am unsure whether I am doing the correct approach.

#LatentSpaceModel Modeling techniques RStan

[edit: escaped code (triple back ticks) and LaTeX ($).]

Bob_Carpenter · June 5, 2024, 9:09am

Hi, @Patrick_O_Rourke and thanks for joining the Stan community. Sorry it’s taken so long to get back to you about this. To start, you can simplify your likelihood to just this if you declare everything as a vector rather than an array:

vector[N] eta = (white * 2 - 1) .* (gamma_white - gamma_black) + W; 
y ~ binomial_logit(N_ij, eta);

Rather than the arithmetic, you can define the sign vector white * 2 - 1 in the transformed data block.

The code looks OK, but the LaTeX definition of the latent position model looks wrong as the definition looks like a Bernoulli probability for a single win, not a binomial.

I have 3 latent variables but how can I create the latent positions for them?

What do you mean by “latent positions”?

In general, the easiest way to build models is to start simple (for instance, not including a white advantage effect), then build up with simulated data to make sure the model’s doing the right thing.

Allowing the white advantage to vary by game is going to be problematic as you can put the entire effect for a game into the white advantage and normal(0, 1) isn’t a particularly strong prior on the log odds scale. It would be better to make this hierarchical so you could shrink the effect so it doesn’t dominate, but I would start by just making it a single scalar. As written, you will wind up with a huge amount of posterior uncertainty because winning a game can be attributed to higher skill of the winning player or if the winner is white, because of white advantage for that game.

Patrick_O_Rourke · June 5, 2024, 11:42am

Hi Bob,

Thank you for your reply.

The Latent Position Model has 3 latent variables:

Gamma_i = the chess rating for player i (White player)
Gamma_j = the chess rating for player j (Black player)
W_i = the adjustment factor of playing white

The goal is to create an alternative rating system for chess players and seeing what is also the latent effect of playing as White.

I have 1897 chess matches in my dataset with 900 players. Some pairs of players have multiple matches with each other.

The distribution of Y_ij in my model is BINOMIAL … not Bernoulli). I am only looking at win and losses as 2 outcomes currently for each game. Once I get this to work, i will move onto 3 outcomes, with the inclusion of draws.

Given this information, how can I find the latent positions for my latent variables?

Topic		Replies	Views
Help with Bayesian Modelling Modeling rstan , prior-choice , priors , initialization	6	273	July 4, 2024
Identifying Non-Identifable Latent Positons Modeling	1	79	July 15, 2024
Fitting a Bayesian Factor Analysis Model in Stan Modeling rstan	3	1153	September 3, 2020
Latent variables, divergent transitions and prior selection Modeling rstan , techniques , fitting-issues	22	1422	September 19, 2021
Posteriors too wide but no obvious problems - fitting a latent normal-multinomial model Modeling fitting-issues	2	492	June 9, 2018

Trouble plotting latent positions in Latent Space Model

Related topics