I took another stab at this.
You have 514 players and 504 predictor states. That makes a total of 514\times 504\approx259000 possible unique (player, p_index)
pairs but you have almost twice that many observations so there must be some duplicate predictor conditions. That allows you to compress the dataset to about half its size by combining duplicate conditions and recording the actions as counts for that predictor combination. So that the data looks like
data {
int Nobs;
int<lower=1,upper=Nplayers> player_id[Nobs];
int<lower=1,upper=Nall> pred_index[Nobs];
int<lower=0> actions[Nobs,3];
}
where actions[i,1]
is the observed number of shots, actions[i,2]
is the number of passes and actions[i,3]
is the number of dribblings.
Counts of repeated categorical draws follow the multinomial distribution.
model {
...
for (i in 1:Nobs) {
actions[i] ~ multinomial_logit(beta_player[:,player_id[i]] + beta_all[:,pred_index[i]]);
}
}
(PyStan doesn’t have multinomial_logit
. You can either use multinomial(softmax(...))
instead or install CmdStanPy.)
However, I think there’s a bigger problem with the model.
The beta parameter to categorical/multinomial has three components, with the components controlling the probability of a shot, a pass or a dribbling, respectively.
The idea is that larger beta[1]
increases the probability of a shot and larger beta[2]
increases the probability of a pass. But it’s not quite that simple because the three probabilities must add up to one. Increasing one beta
necessarily decreases the other two probabilities. And changing all three betas by the same amount leaves the probabilities unchanged. Although it is convenient to parametrize the distribution by three numbers there are in fact only two degrees of freedom.
In theory this is not a problem; the model will simply integrate the extra parameter over its prior.
Unfortunately in practice (1.) it complicates interpretation of beta
and (2.) the model has one redundant parameter for every player, zone, localia, etc for a total of one third of all parameters being useless and that’s quite a lot of work.
One way to remove the extra parameters is this:
parameters {
matrix[2,Nplayers] beta_player;
matrix[2,Nzones] beta_zone;
matrix[2,Nloc] beta_loc;
matrix[2,Nres] beta_res;
matrix[2,Ntimes] beta_time;
}
model {
matrix[2,Nall] beta_all;
for (z in 1:Nzones) {
vector[2] bz = beta_zone[:,z];
...
}
...
for (i in 1:Nobs) {
vector[2] beta = beta_player[:,player_id[i]] + beta_all[:,pred_index[i]];
actions[i] ~ multinomial_logit(append_row(0.0, beta));
}
}
Here append_row
builds three component vector [0.0, beta[1], beta[2]]
.
Shot probability is fixed as the reference value. beta[1]>0
means pass is more likely than a shot; beta[1]<0
means a pass is less likely than a shot. beta[2]
is the same for dribblings.
Fixing one value like that makes thinking about priors a bit more complicated but standard normal on a logit scale is quite noninformative anyway so maybe you’re not too concerned about priors.