Sum of event probability is not equal to one

HJAM24 · June 26, 2023, 8:19am

Hi all,

I have a number of players and I am interested in their skill. In order to determine if the model makes sense I want to build some checks.

Inspired by this blog post by Bob Carpenter I write the following code

generated quantities {
  int<lower=1, upper=n_players> rank[n_players];   
  int dsc[n_players] = sort_indices_desc(skill);
  int<lower=0, upper=1> is_best[n_players];
  int<lower=0, upper=1> is_2nd_best[n_players];
  int<lower=0, upper=1> is_3rd_best[n_players];
  
  for (player in 1:n_players) {
      rank[dsc[player]] = player;
      is_best[player] = (rank[player] == 1);
      is_2nd_best[player] = (rank[player] == 2);
      is_3rd_best[player] = (rank[player] == 3);
  }
}

The sum of is_best is 1 as expected (of the n_players there should be one the best. ).
However, this is not the case for is_2nd_best and is_3rd_best
I don’t understand why the above works for is_best but not for the other two variables?
I also tried to define the variables as a a simplex, but then the script crashes (again at is_2nd_best and is_3rd_best)

Please show me what’s wrong. Thank you

avehtari · June 26, 2023, 11:01am

Are you running the generated quantities separately after sampling? It’s a known issue that Stan writes to csv with 6 digit accuracy, but the simplex sum to 1 is making the check with 8 digit accuracy. You can increase the number of digits saved, but to give specific instructions it would help to know which interface you are using.

HJAM24 · June 26, 2023, 11:18am

No, the “generated quantities”-code is not a separate script.
The difference is much larger than that. I believe the sum was ~0.70.

jsocolar · June 26, 2023, 2:54pm

Regardless of whether the upstream parts of the model are doing what you want, is_best, is_2nd_best, etc can only contain the integers 0 and 1. Thus, it is impossible that the sum of is_2nd_best could ever be 0.7. Perhaps there is an error in how you take the sum?

HJAM24 · June 26, 2023, 4:35pm

Yes I agree, so I don’t understand whats going wrong.

This is how it looks in my notebook. In both cases its the second player that has a zero probability of being 2nd / 3th.

ssp3nc3r · June 26, 2023, 6:44pm

Looks like you are summing the means, not across draws.

HJAM24 · June 27, 2023, 5:15pm

What do you mean exactly? There are 7 players, so we see the correct shape of the output?
fit.posterior.is_best.mean(dim=('chain', 'draw'))) is the correct syntax as I specify both chain and draw.
Its basically the equivalent of np.mean(a, axis=0 & 1)

It doesn’t make sense that the second player has a zero probability of being 2nd or 3rd. I think this is also causing the simplex to fail?

HJAM24 · June 27, 2023, 5:19pm

This is the output of fit.posterior.is_2nd_best.mean(dim=('draw'))

This is the output of fit.posterior.is_2nd_best.mean(dim=('chain'))

Topic		Replies	Views
Bob Carpenter's code for ranking in Stan case study on repeated binary trials Modeling	2	532	January 10, 2019
Pathfinder determines inits for simplex with the sum not equal one Developers stan	5	284	July 4, 2024
Problem in generated quantities Modeling specification	2	702	November 15, 2019
Weird result in Generated Quantities Modeling	6	290	November 16, 2023
Generated Quantities: Getting an Error Modeling	1	985	April 6, 2018

Sum of event probability is not equal to one

Related topics