RuntimeError: Goal Soccer Model

Hello,

I’m trying to create a model to calculate a probability to do a goal in a soccer game. To do that, my data considering who the kicker is (player_id), who the goalkeeper is (glk_id), in which zone is the kicker (cat_zone), in what period of time is the shot (timeFrame), what is the current result of the player team (cat_res) and if the player team is home/away (localia).

I think the best option is using a bernoulli_logit.

The model is as follows:

goals_model = """
data {
    int<lower=0> N; // number of observations 8451
    int players; // number of players 426
    int glk; // number of goalkeepers 38
    int zones; // number of field zones 8
    int time; // number of time frames 7
    int res; // types of results (winning, losing, tying)
    int loc; // localia
    
    vector[N] player_id;
    vector[N] glk_id;
    vector[N] cat_zone;
    vector[N] timeFrame;
    vector[N] cat_res;
    vector[N] localia;
    
    int goal[N]; // dependent variable
}
parameters {

    real alpha; // intercept
    
    vector[players] beta_player; // coefficient associated with each player
    vector[glk] beta_glk; // coefficient associated with each goalkeeper
    vector[zones] beta_zones; // coefficient associated with each zone
    vector[time] beta_time; // coefficient associated with each time frame
    vector[res] beta_res; // coefficient associated with each result
    vector[loc] beta_loc; // coefficient associated with each type of localia
    
    real epsilon; //Uncertainty / unexplained variance
}
model {
    // priors
    alpha ~ normal(0,1);
    beta_player ~ normal(0,1);
    beta_glk ~ normal(0,1);
    beta_zones ~ normal(0,1);
    beta_time ~ normal(0,1);
    beta_res ~ normal(0,1);
    beta_loc ~ normal(0,1);
 
    goal ~ bernoulli_logit(alpha + beta_player .* player_id + beta_glk .* glk_id + 
        beta_zones .* cat_zone + beta_time .* timeFrame + beta_res .* cat_res + beta_loc .* localia);
     
}
"""

Then, I ran the following code:

goal_reg = pystan.model.StanModel(model_code=goals_model, model_name=‘goal_reg’)

But, then when i try this:

lin_fit = goal_reg.sampling(data=datos,
iter=1000, chains=4,
warmup=500, n_jobs=-1,
seed=42)

I get this error:

" RuntimeError: Exception: elt_multiply: Rows of m1 (2) and rows of m2 (8451) must match in size (in ‘unknown file name’ at line 43)"

I would appreciate if someone could help me whit that. I’m doing my university final work.

The error points to line 43 which has expressions like beta_player .* player_id. That is an element-wise multiplication of two vectors with different sizes. But you shouldn’t multiply by the player ID. Instead the ID should be used as an index to the beta_player array.
Try this code:

data {
    ...
    int<lower=1,upper=players> player_id[N];
    int<lower=1,upper=glk> glk_id[N];
    int<lower=1,upper=zones> cat_zone[N];
    ....
}
...
model {
    ...
    goal ~ bernoulli_logit(alpha + beta_player[player_id] + beta_glk[glk_id] + ...);
 }
3 Likes

Thak you nhuurre for your time,

I tried this now, including the changes you told me:

goals_model = """
data {
    int<lower=0> N; // number of observations 8451
    int players; // number of players 426
    int glk; // number of goalkeepers 38
    int zones; // number of field zones 8
    int time; // number of time frames 7
    int res; // types of results (winning, losing, tying)
    int loc; // localia
    
    int<lower=1,upper=players> player_id[N];
    int<lower=1,upper=glk> glk_id[N];
    int<lower=1,upper=zones> cat_zone[N];
    int<lower=1,upper=time> time_frame[N];
    int<lower=1,upper=res> cat_res[N];
    int<lower=0,upper=loc> localia[N];
    
    int goal[N]; // dependent variable
}
parameters {

    real alpha; // intercept
    
    vector[players] beta_player; // coefficient associated with each player
    vector[glk] beta_glk; // coefficient associated with each goalkeeper
    vector[zones] beta_zones; // coefficient associated with each zone
    vector[time] beta_time; // coefficient associated with each time frame
    vector[res] beta_res; // coefficient associated with each result
    vector[loc] beta_loc; // coefficient associated with each type of localia
    
    real epsilon; //Uncertainty / unexplained variance
}
model {
    // priors
    alpha ~ normal(0,1);
    beta_player ~ normal(0,1);
    beta_glk ~ normal(0,1);
    beta_zones ~ normal(0,1);
    beta_time ~ normal(0,1);
    beta_res ~ normal(0,1);
    beta_loc ~ normal(0,1);
 
    goal ~ bernoulli_logit(alpha + beta_player[player_id] + beta_glk[glk_id] + 
        beta_zones[cat_zone] + beta_time[time_frame] + beta_res[cat_res] + beta_loc[localia]);
     
}

Then i run this:

goal_fit = goal_reg.sampling(data=datos, iter=1000, chains=4, warmup=500, n_jobs=-1, seed=42)

But i get this error:

“RuntimeError: Exception: vector[multi] indexing: accessing element out of range. index 0 out of range; expecting index to be between 1 and 2 (in ‘unknown file name’ at line 43)”

Remember that indexes start at 1 in Stan (Python starts at 0)

1 Like

Yes, i know. But i don’t know what i’m doing wrong.
If you could help me, let me know what other information I can give you, because i have not been able to identify which index starts at 0.

Thankss

Sure, do you have the code that creates the idx?

If np.array

x = x + 1

You’re using the IDs as indexing. Should be pretty easy to find which ID is 0.
I’m guessing it’s one of localia based on this line

data {
    ...
    int<lower=0,upper=loc> localia[N];
}

The lower bound should be lower=1.

1 Like

Thak you nhuurre, now i didn’t have any problem.
Do you know where can I find something that explains how to interpret the results of fit?

              mean se_mean     sd   2.5%     25%    50%     75%  97.5%  n_eff   Rhat
alpha        -1.58    0.02   0.73  -3.01   -2.09  -1.59   -1.06  -0.16   1559    1.0
beta_player[1] 0.92  5.4e-3   0.35   0.18     0.7   0.94    1.14   1.58   4201    1.0
beta_player[2] 0.95  5.4e-3   0.35   0.22    0.74   0.95    1.18   1.64   4188    1.0
beta_player[3] 0.83  6.4e-3   0.45  -0.11    0.55   0.84    1.13   1.72   5086    1.0
beta_player[4]-7.0e-3  9.3e-3   0.54  -1.15   -0.36   0.03    0.36   0.96   3357    1.0
...
lp__          -2840    0.55  15.61  -2871   -2850  -2840   -2829  -2810    809    1.0