In the traditional binomial model of trials and success, the reward for each trial can be considered as 1 or 0. For example, in modeling baseball players, you can think of AB as the number of trials and hits as number of successes. In this particular case, your outcome for each trial is in {0, 1}.

What is the right way to model the case where the outcome of each trial can be in {0,…, N}, where N is an integer. Is there a use case/example one might read to understand the right approach in Stan (e.g., the right hyperpriors to use)?

EDIT:

Lets say we have a game of die, where we roll the die and get the corresponding value as reward ({1,…,6}). I want to figure our which die is the best to roll (maximize my reward)
dice number | # of total rolls | total reward collected
1 | 10 | 3
2 | 100 | 302
3 | 1 | 6

If the observations are the count of the number of time each integer between 0 and N occurs, then multinomial. If, on the other hand, y[i] is like bowling and the observation is just some score between 0 and N, then you can just use the simplex type in Stan. It would be something like

data {
int<lower=1> N;
int<lower=1> len_y;
int<lower=0,upper=N> y[len_y];
}
parameters {
simplex[N] pi;
}
model {
target += log(pi[y]]);
}

Lets say we have a game of die, where we roll the die and get the corresponding value as reward ({1,…,6}. I want to figure our which die is the best to roll (maximize my reward)
dice number | # of total roll | total reward collected
1 | 10 | 3
2 | 100 | 302
3 | 1 | 6

I am having difficulty mapping the program you suggested to solving this problem. It seems like the model is assuming the observations are the rewards, where in the problem stated it is the total reward collected.

If the data are like the number of times that the integers one through six come up on independent rolls, then the multinomial distribution is appropriate.

The tricky part is I don’t know the number of times that the integers one through six come up on independent rolls, I just know their total. Are the corresponding probabilities {p_1, …, p_6} be all have priors?

Furthermore, if I am uncertain about the total possible number of outcomes, should I just also put a prior on N , i.e., make N a random number.

Stan won’t let you declare an integer parameter. If all you know is that the sum of some number of rolls is 42, then you need a giant simplex vector for all feasible ways to get to 42, which is going to be unwieldy at best.

You have to marginalize out the number of terms to be added. You cannot declare it as an unknown in the parameters block because the HMC-based algorithms require the log-kernel be differentiable with respect to the parameters.

Those are good references, basically I would need to do some algebra.

On another note, what if the rewards can be any positive float? In that case, it is similar to the classical M/M/1 or M/G/1, where each observation records the number of people arrived and the number people serviced during a period of time (you can think of reward as 1/(# of people served). I searched quite a bit online and didn’t see any Stan implementation or anything similar. Would you by any chance know any material on this?

I don’t. If the reward is the reciprocal of an integer, it cannot be just any real number, so you would have the same issue of needing to marginalize them out.

If it’s a longstanding model that people have used, you can usually find the marginalizations in the bits of a 1980s paper where they fit maximum likelihood with EM or a generic continuous optimizer—they always had to marginalize the discrete parameters in the same way as is required for Stan.