Help with creating desired vector

Hi all,

I am working on a my own version of the paper of Baio and Blangiardo. They create a Bayesian hierarchical model on team level to determine the attack and defense parameters of each team. They use these parameters and home-advantage-parameter to forecast game scores. Visually, their layout looks as follows:

My interpretation of their model in Stan is

data { 
int<lower=1> n_teams; 
int<lower=1> n_matches;
int<lower=1, upper=n_teams> home_team[n_matches];
int<lower=1, upper=n_teams> away_team[n_matches];
int<lower=0> home_goals[n_matches];
int<lower=0> away_goals[n_matches];

parameters {
real home;
real mu_att;
real mu_def;
real<lower=0> tau_att;
real<lower=0> tau_def;
vector[n_teams - 1] raw_att;
vector[n_teams - 1] raw_def;

transformed parameters {
vector[n_teams] att;
vector[n_teams] def;
vector[n_matches] home_log_theta;
vector[n_matches] away_log_theta;

for (team in 1:(n_teams - 1)) {
    att[team] = raw_att[team];
    def[team] = raw_def[team];

att[n_teams] = -sum(raw_att);
def[n_teams] = -sum(raw_def);

home_log_theta = exp(home + att[home_team] + def[away_team]);
away_log_theta = exp(att[away_team] + def[home_team]);

model {  
// priors 
home ~ normal(0, 100);
mu_att ~ normal(0, 10);
mu_def ~ normal(0, 10);
tau_att ~ gamma(0.01, 0.01);
tau_def ~ gamma(0.01, 0.01);
raw_att ~ normal(mu_att, tau_att);
raw_def ~ normal(mu_def, tau_def);
home_goals ~ poisson(home_log_theta);
away_goals ~ poisson(away_log_theta);

Now, I would like to make my model on a player-level. I assume that for each team:

  • the attack parameter is equal to exp(sum(attack parameters of each player of team))
  • the defense parameter is equal to exp(sum(defense parameters of each player of team))


In the graph I only draw the player-lines for the home team, but the same method applies for the away team.
My code looks as follows:

data {
  int<lower=1> n_teams;
  int<lower=1> n_games;
  int<lower=1> n_players
  int<lower=1, upper=n_teams> home_team_id[n_games];
  int<lower=1, upper=n_teams> away_team_id[n_games];
  int<lower=1, upper=n_players> player_id[n_games];
  int<lower=0> home_goals[n_games];
  int<lower=0> away_goals[n_games];

parameters {
  real home_advantage;
  real player_att_mu;
  real player_att_tau;
  real player_def_mu;
  real player_def_tau;
  real team_att_raw[n_teams - 1];
  real team_def_raw[n_teams - 1];
  real player_att_raw[n_players - 1];
  real player_def_raw[n_players - 1];

transformed parameters {
  // Enforce sum-to-zero constraint
  real team_att[n_teams];
  real team_def[n_teams];
  real player_att[n_players];
  real player_def[n_players];

  vector[n_games] home_expected_goals;
  vector[n_games] away_expected_goals;

  for (team in 1:(n_teams-1)) {
    team_att[team] = team_att_raw[team];
    team_def[team] = team_def_raw[team];

  team_att[n_teams] = -sum(team_att_raw);
  team_def[n_teams] = -sum(team_def_raw);

  for (player in 1:(n_players-1)) {
    player_att[player] = player_att_raw[player]
    player_def[player] = player_def_raw[player]
  player_att[n_players] = -sum(player_att_raw);
  player_def[n_players] = -sum(player_def_raw);
  for (game in 1:n_games) {
    team_att[home_team_id[game]] = exp(sum(player_att[player_id[game]]));     // problem
    team_def[home_team_id[game]] = exp(sum(player_def[player_id[game]]));     // problem
    team_att[away_team_id[game]] = exp(sum(player_att[player_id[game]]));     // problem
    team_def[away_team_id[game]] = exp(sum(player_def[player_id[game]]));     // problem

    home_expected_goals[game] = exp(team_att[home_team_id[game]] + team_def[away_team_id[game]] + home_advantage);
    away_expected_goals[game] = exp(team_att[away_team_id[game]] + team_def[home_team_id[game]]);

model {
  // Priors (uninformative)
  player_att_mu ~ normal(0, 10);
  player_def_mu ~ normal(0, 10);
  player_att_tau ~ gamma(0.1, 0.1);
  player_def_tau ~ gamma(0.1, 0.1);
  home_advantage ~ normal(0, 100);

  home_goals ~ poisson(home_expected_goals);
  away_goals ~ poisson(away_expected_goals);

How can I make sure that my code knows for each game which players belonged to the home team and which players belonged to the away team. I need to specify somewhere a vector with the player_ids of each team. But I don’t see how this can be done. Please advice! Help is much appreciated

1 Like

I don’t think that this is going to work because presumably each player typically belongs to one team, so the model can’t identify their unique contributions from just the team performance.

Yes, correct I assume that each player belongs to only team (still there could be transfers in the winter period but let’s ignore that). But the team selection is likely to change per match right? Due to injuries, suspensions etc. In addition, I could weight each player for % of the match he played.
This paper estimates the individual strengths from team-level results. Why would the method not work for my particular case? Thanks for brainstorming with me

Yes, the model will become more well-“identified” the more that there is variability from game to game in which players are actually playing for a given team. Furthermore, if playing in a game is less a dichotomous played-or-didn’t-play state and more a played-for-X-minutes thing, then your idea for using weights to indicate the latter is sound.

As for implementation, gimme a bit to mull on how best to achieve that…