Sum of event probability is not equal to one

Hi all,

I have a number of players and I am interested in their skill. In order to determine if the model makes sense I want to build some checks.

Inspired by this blog post by Bob Carpenter I write the following code

generated quantities {
  int<lower=1, upper=n_players> rank[n_players];   
  int dsc[n_players] = sort_indices_desc(skill);
  int<lower=0, upper=1> is_best[n_players];
  int<lower=0, upper=1> is_2nd_best[n_players];
  int<lower=0, upper=1> is_3rd_best[n_players];
  
  for (player in 1:n_players) {
      rank[dsc[player]] = player;
      is_best[player] = (rank[player] == 1);
      is_2nd_best[player] = (rank[player] == 2);
      is_3rd_best[player] = (rank[player] == 3);
  }
}

The sum of is_best is 1 as expected (of the n_players there should be one the best. ).
However, this is not the case for is_2nd_best and is_3rd_best
I don’t understand why the above works for is_best but not for the other two variables?
I also tried to define the variables as a a simplex, but then the script crashes (again at is_2nd_best and is_3rd_best)

Please show me what’s wrong. Thank you

Are you running the generated quantities separately after sampling? It’s a known issue that Stan writes to csv with 6 digit accuracy, but the simplex sum to 1 is making the check with 8 digit accuracy. You can increase the number of digits saved, but to give specific instructions it would help to know which interface you are using.

No, the “generated quantities”-code is not a separate script.
The difference is much larger than that. I believe the sum was ~0.70.

Regardless of whether the upstream parts of the model are doing what you want, is_best, is_2nd_best, etc can only contain the integers 0 and 1. Thus, it is impossible that the sum of is_2nd_best could ever be 0.7. Perhaps there is an error in how you take the sum?

1 Like

Yes I agree, so I don’t understand whats going wrong.



This is how it looks in my notebook. In both cases its the second player that has a zero probability of being 2nd / 3th.

Looks like you are summing the means, not across draws.

1 Like

What do you mean exactly? There are 7 players, so we see the correct shape of the output?
fit.posterior.is_best.mean(dim=('chain', 'draw'))) is the correct syntax as I specify both chain and draw.
Its basically the equivalent of np.mean(a, axis=0 & 1)

It doesn’t make sense that the second player has a zero probability of being 2nd or 3rd. I think this is also causing the simplex to fail?

This is the output of fit.posterior.is_2nd_best.mean(dim=('draw'))

This is the output of fit.posterior.is_2nd_best.mean(dim=('chain'))