Multinomial Logit: probability of choice of a soccer action

It means the loop runs for more iterations than there are items in actions. I guess my previous reply was somehat ambiguous so I’ll say it more explicitly: if the size of actions is Npairs×3 then the loop must be for (i in 1:Npairs).

2 Likes

@nhuurre now, i got the sampling, and the format is as follows:

                    mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
beta_player[1,1]     0.1  4.6e-3   0.16  -0.22  -0.02   0.09   0.21   0.42   1305    1.0
beta_player[2,1]    0.76  4.2e-3   0.16   0.46   0.65   0.75   0.86   1.07   1419    1.0
beta_player[1,2]    0.84    0.01   0.51  -0.11   0.48   0.82   1.18   1.82   1806    1.0
beta_player[2,2]    0.31    0.01   0.51  -0.63  -0.06   0.29   0.65   1.31   1637    1.0
beta_player[1,3]    0.65  5.2e-3    0.2   0.25   0.51   0.65   0.79   1.06   1516    1.0
beta_player[2,3]   -0.07  5.1e-3    0.2  -0.47  -0.21  -0.07   0.07   0.34   1590    1.0
beta_player[1,4]    0.59  8.9e-3   0.35  -0.06   0.36   0.58   0.81   1.31   1538    1.0
beta_player[2,4]   -0.51  8.3e-3   0.35  -1.15  -0.74  -0.54  -0.29   0.23   1777    1.0

What does the format beta_player[a,b] mean?

I assume your parameters block looks like

parameters {
    matrix[2,Nplayers] beta_player;
    matrix[2,Nzones] beta_zone;
    matrix[2,Nloc] beta_loc;
    matrix[2,Nres] beta_res;
    matrix[2,Ntimes] beta_time;
}

and in the model block you have

for (i in 1:N) {
    vector[2] beta = beta_player[:,player_id[i]] + beta_all[:,pred_index[i]];
    actions[i] ~ multinomial(softmax(append_row(0.0, beta)));
}

and that actions[i] is {n_shots, n_passes, n_dribblings} in the ith row of the dataset.

Then the interpretation is something like, beta_player[1,k]=0.65 means player number k is exp(0.65)=1.9 times more likely to do a pass than a shot and beta_player[2,k]=-0.07 means that that same player is exp(-0.07)=0.93 times as likely (i.e. 7% less likely) to do a dribbling than a shot. But these probabilities are also modified by the zone and time so it’s not quite so straightforward.

2 Likes

Perfect!

So, I need to know, how is the functional form of the multinomial function in stan?

Could you tell me or where to find information about that?

The functions reference has a page on multinomial and softmax.

It may be helpful to compute some predicted probabilities to examine. For example

import numpy as np
from scipy.special import softmax # same as Stan's softmax
# extract draws from the fit
beta_player = fit.extract()['beta_player']
beta_zone = fit.extract()['beta_zone']
# etc
# let's say we're interested in player number 50
# on zone 3, localia 1, ... (NB: Python indexing starts from 0)
beta = beta_player[:,:,50] + beta_zone[:,:,3] + beta_loc[:,:,1] + #etc
 # beta is (N_draws, 2) array, add the zero column
beta = np.column_stack(np.zeros(beta.shape[0]), beta)
# apply softmax, then calculate average over draws
probs = softmax(beta, axis=1).mean(axis=0)
prob[0] # probability of a shot
prob[1] # probability of a pass
prob[2] # probability of a dribbling

Btw, this thread is getting quite long. If you need more help interpreting the coefficients you could start a new thread about that. More people will see it.

3 Likes