# Sum of event probability is not equal to one

Hi all,

I have a number of players and I am interested in their skill. In order to determine if the model makes sense I want to build some checks.

Inspired by this blog post by Bob Carpenter I write the following code

``````generated quantities {
int<lower=1, upper=n_players> rank[n_players];
int dsc[n_players] = sort_indices_desc(skill);
int<lower=0, upper=1> is_best[n_players];
int<lower=0, upper=1> is_2nd_best[n_players];
int<lower=0, upper=1> is_3rd_best[n_players];

for (player in 1:n_players) {
rank[dsc[player]] = player;
is_best[player] = (rank[player] == 1);
is_2nd_best[player] = (rank[player] == 2);
is_3rd_best[player] = (rank[player] == 3);
}
}
``````

The sum of `is_best` is 1 as expected (of the n_players there should be one the best. ).
However, this is not the case for `is_2nd_best` and `is_3rd_best`
I donâ€™t understand why the above works for `is_best` but not for the other two variables?
I also tried to define the variables as a a simplex, but then the script crashes (again at `is_2nd_best` and `is_3rd_best`)

Please show me whatâ€™s wrong. Thank you

Are you running the generated quantities separately after sampling? Itâ€™s a known issue that Stan writes to csv with 6 digit accuracy, but the simplex sum to 1 is making the check with 8 digit accuracy. You can increase the number of digits saved, but to give specific instructions it would help to know which interface you are using.

No, the â€śgenerated quantitiesâ€ť-code is not a separate script.
The difference is much larger than that. I believe the sum was ~0.70.

Regardless of whether the upstream parts of the model are doing what you want, `is_best`, `is_2nd_best`, etc can only contain the integers 0 and 1. Thus, it is impossible that the sum of `is_2nd_best` could ever be 0.7. Perhaps there is an error in how you take the sum?

1 Like

Yes I agree, so I donâ€™t understand whats going wrong.

This is how it looks in my notebook. In both cases its the second player that has a zero probability of being 2nd / 3th.

Looks like you are summing the means, not across draws.

1 Like

What do you mean exactly? There are 7 players, so we see the correct shape of the output?
`fit.posterior.is_best.mean(dim=('chain', 'draw')))` is the correct syntax as I specify both `chain` and `draw`.
Its basically the equivalent of `np.mean(a, axis=0 & 1)`

It doesnâ€™t make sense that the second player has a zero probability of being 2nd or 3rd. I think this is also causing the simplex to fail?

This is the output of `fit.posterior.is_2nd_best.mean(dim=('draw'))`

This is the output of `fit.posterior.is_2nd_best.mean(dim=('chain'))`