Hello,
I’m trying to create a model to calculate a probability to do a pass (accurate) in a soccer game. To do that, my data considering who the passer is (player_id), in which zone is the kicker (cat_zone_i), in which zone is the receiver (cat_zone_f), in what period of time is the shot (timeFrame), what is the current result of the player team (cat_res) and if the player team is home/away (localia).
I think the best option is using a bernoulli_logit.
My model is here:
passes_model = """
data {
int<lower=0> N; // number of observations (328530 solo jugadores con >10) (328657 todos los pases)
int players; // number of players 488
int zones_i; // number of field zones 8
int zones_f; // number of field zones 8
int time; // number of time frames 7
int res; // types of results (winning, losing, tying)
int loc; // localia
int<lower=1,upper=players> player_id[N];
int<lower=1,upper=zones_i> cat_zone_i[N];
int<lower=1,upper=zones_f> cat_zone_f[N];
int<lower=1,upper=time> time_frame[N];
int<lower=1,upper=res> cat_res[N];
int<lower=1,upper=loc> localia[N];
int pase[N]; // dependent variable
}
parameters {
real alpha; // intercept
vector[players] beta_player; // coefficient associated with each player
vector[zones_i] beta_zones_i; // coefficient associated with each zone_i
vector[zones_f] beta_zones_f; // coefficient associated with each zone_f
vector[time] beta_time; // coefficient associated with each time frame
vector[res] beta_res; // coefficient associated with each result
vector[loc] beta_loc; // coefficient associated with each type of localia
real epsilon; //Uncertainty / unexplained variance
}
model {
// priors
alpha ~ normal(0,1);
beta_player ~ normal(0,1);
beta_zones_i ~ normal(0,1);
beta_zones_f ~ normal(0,1);
beta_time ~ normal(0,1);
beta_res ~ normal(0,1);
beta_loc ~ normal(0,1);
pase ~ bernoulli_logit(alpha + beta_player[player_id] + beta_zones_i[cat_zone_i] +
beta_zones_f[cat_zone_f] + beta_time[time_frame] + beta_res[cat_res] + beta_loc[localia]);
}
"""
Then, i ran:
passes_reg = pystan.model.StanModel(model_code=passes_model,
model_name='passes_reg')
The database is created like this:
N = len(df_passes_ps.new_id)
players = len(df_passes_ps.new_id.unique())
zones_i = len(df_passes_ps.cat_zone_i.unique())
zones_f = len(df_passes_ps.cat_zone_f.unique())
time = len(df_passes_ps.timeFrame.unique())
res = len(df_passes_ps.cat_res.unique())
loc = len(df_passes_ps.localia.unique())
player_id = df_passes_ps.new_id
cat_zone_i = df_passes_ps.cat_zone_i
cat_zone_f = df_passes_ps.cat_zone_f
time_frame = df_passes_ps.timeFrame
cat_res = df_passes_ps.cat_res
localia = df_passes_ps.localia
pase = df_passes_ps.accurate
datos = {'N': N, 'players': players, 'zones_i': zones_i, 'zones_f': zones_f, 'time': time, 'res': res, 'loc': loc,
'player_id': player_id, 'cat_zone_i': cat_zone_i, 'cat_zone_f': cat_zone_f, 'time_frame': time_frame, 'cat_res': cat_res, 'localia': localia,
'pase': pase}
But, when i try this, never ends:
passes_fit = passes_reg.sampling(data=datos,
iter=1000, chains=4,
warmup=500, n_jobs=-1,
seed=42)
I would appreciate if you told me what my problem is