Hello,
I’m trying to create a model to calculate a probability to do a goal in a soccer game. To do that, my data considering who the kicker is (player_id), who the goalkeeper is (glk_id), in which zone is the kicker (cat_zone), in what period of time is the shot (timeFrame), what is the current result of the player team (cat_res) and if the player team is home/away (localia).
I think the best option is using a bernoulli_logit.
The model is as follows:
goals_model = """
data {
int<lower=0> N; // number of observations 8451
int players; // number of players 426
int glk; // number of goalkeepers 38
int zones; // number of field zones 8
int time; // number of time frames 7
int res; // types of results (winning, losing, tying)
int loc; // localia
vector[N] player_id;
vector[N] glk_id;
vector[N] cat_zone;
vector[N] timeFrame;
vector[N] cat_res;
vector[N] localia;
int goal[N]; // dependent variable
}
parameters {
real alpha; // intercept
vector[players] beta_player; // coefficient associated with each player
vector[glk] beta_glk; // coefficient associated with each goalkeeper
vector[zones] beta_zones; // coefficient associated with each zone
vector[time] beta_time; // coefficient associated with each time frame
vector[res] beta_res; // coefficient associated with each result
vector[loc] beta_loc; // coefficient associated with each type of localia
real epsilon; //Uncertainty / unexplained variance
}
model {
// priors
alpha ~ normal(0,1);
beta_player ~ normal(0,1);
beta_glk ~ normal(0,1);
beta_zones ~ normal(0,1);
beta_time ~ normal(0,1);
beta_res ~ normal(0,1);
beta_loc ~ normal(0,1);
goal ~ bernoulli_logit(alpha + beta_player .* player_id + beta_glk .* glk_id +
beta_zones .* cat_zone + beta_time .* timeFrame + beta_res .* cat_res + beta_loc .* localia);
}
"""
Then, I ran the following code:
goal_reg = pystan.model.StanModel(model_code=goals_model, model_name=‘goal_reg’)
But, then when i try this:
lin_fit = goal_reg.sampling(data=datos,
iter=1000, chains=4,
warmup=500, n_jobs=-1,
seed=42)
I get this error:
" RuntimeError: Exception: elt_multiply: Rows of m1 (2) and rows of m2 (8451) must match in size (in ‘unknown file name’ at line 43)"
I would appreciate if someone could help me whit that. I’m doing my university final work.