Constraining Output of Multinomial Distribution

I have a general question regarding constraining the output of a multinomial regression.

Assume the model is as follows: we have 5 types of fruit, each having N_k, k = 1,\ldots,5 pieces to choose from. If I select N total pieces of fruit, I would like to model how many of each fruit I am going to choose.

For this, I am going to use a Multinomial distribution.

The issue is, at each “time-step” I can only choose a maximum of N_k of the k th type of fruit. Since I know N_k for each time-step, I would like to constrain the model such that the multinomial doesn’t return more of the bounded amount of fruit.

Is there a way of “clipping” the multinomial to do this? Assume any prior for it needed. That is, if a Dirichlet prior can have bounded output probabilities, then that would work. If a Generalized Dirichlet prior can bound each Beta distribution, then I’ll use that, etc.

Note that the number of fruit at each time-step is variable, but known when doing inference (so I cannot make a constrained input vector with hard bounds in Stan)

I think you are going to have to truncate the multinomial distribution yourself. So the likelihood contribution of the k-th time-step is multinomial_lpdf(y[k] | theta) - log1m(sum_of_precluded_probabilities).

I’m not sure if I understood the application correctly, but if you want to “bound” the number of drawn fruits from each type because you know how many of each type is available, this is basically taking draws without replacement. The multinomial distribution assumes draws with replacement (like the binomial). The multivariate hypergeometric (https://en.m.wikipedia.org/wiki/Hypergeometric_distribution#Multivariate_hypergeometric_distribution) is maybe the thing you are looking for. Like the hypergeometric districtribution, it assumes drawing without replacement. Parameters are the number of available items to draw from, the number available for each type, and the number of items drawn. Those are all integers, so if any of those are unknown or only partly observed in the data, they would need to be integrated over in Stan.

1 Like

Thanks! This is exactly what I needed!

Perfect! Please let us Stanimals here know how it turned out!

1 Like

Am I correct in assuming the multivariate hypergeometric distribution is not implemented in Stan?

Yes

1 Like

After further investigation, the multivariate hypergeometric distribution isn’t really the solution.

In particular, for inference, it would attempt to learn the N_k values for each fruit. A quantity I already know. Furthermore, I know all of the values of the distribution, that is, I have known data for what was drawn, and I know the population of each of the categories.

One thing I thought of is that, since the dirichlet-multinomial is a conjugate prior to the multivariate hypergeometric distribution, then I could feasible learn the probabilities that induce my population counts at each timestep. But this isn’t really what I want either.

Thoughts?

To be able to help you: what is it exactly that you towant make inferences about? Recap of what you described

This can be addressed with Monte Carl simulation, as you have all the information. There are no free parameters.

But maybe you are thinking of something more specific (can’t tell from original post). It would therefore be useful if you would describe the architecture of the data and what you would like to learn from it. I.e. What exactly is observed in the data (and any potential hierarchical structure)?

2 Likes

For clarity, I will expound on the problem I am trying to solve (it is a toy problem to be sure, but has all of the components of the solution I desire).

So the problem is to forecast sales of fruit at a fruit stand. The fruit stand sells apples, pears, oranges, bananas and pineapples. Assume we have a wholesale buyer that will buy a block of fruit each day, say N items. What we do not know is how much of each type of fruit the buyer will purchase. The assumption is that there is a “static” probability that the buyer will use to choose fruit, but if there isn’t enough of that type to fulfill their demand, they will switch to another type of fruit, until they get their N pieces of fruit.

Our historical data is: the total amount of each fruit on each day X_i(t), i=1,\ldots,5. Also, we have the amount N(t) that the buyer purchased each day.

So, I would like to know, given current levels of fruit stock, and a forecasted N(t) how much of each type of fruit will be sold on a given day.

Some thoughts: if the amounts of fruit we “large enough” each day, then I could model the entire system as Multinomial, and compute the probability of selling fruit i on a given day: p_i(t). The issue is sometimes I don’t have enough of a type of fruit that the buyer would select if I had it, so I put some of the other fruit into the sale (since I have to sell exactly N(t) items on that day). So I would also like to learn the correlative structure of how different fruit sells as each type “runs out” – that is, if I am low on apples, the buyer selects more pears, etc. I assume that the way the selection process works is that there may be some seasonal properties, both to the probabilities p_i(t) but not to how the selection process changes when fruit runs out.

One complicating factor to this problem is that I place a restocking order every 2 weeks say, and I would like to use my forecast to plan what to buy.

Does this clarity the problem?

Yes this clarifies a lot. As expected, it doesn’t make things simpler haha.

From your explanation, I’m distilling the following unobserved variable: the true stock available for the wholesale buyer to buy. This poses a challenge: modelling an integer latent variable requirea integrating over all potential states of thentlatent variable, which is expensive if there are many latent states. And in this case, there are many many many, I imagine even for a modest total available stock N. In Stan you can’t integrate by sampling this latent state, but you have to considtr each potential state and assign some probability to it, either based on a prior or a more elaborate model. So that’s the first challenge.

The second challenge is that you then want to have a model for 1) the wholesales buyer’s / fruit stand salesman’s preference for buying fruit types, conditional on available stock, and 2) the customers’ preference conditional on what the wholesale buyer / fruit stand has on offer. Now, if I understand correctly, you know exactly what the wholesale buyer / fruit stand bought. So far so good.

Then third, you want to make inference about the wholesale buyer’s / fruit stand salesman’s preference or strategy for buying fruit types to depens on sales during the last two weeks. If I understand correctly, the actual (true) sales are also known. That would be great so as not to have also integrate over all possible states of some unknown sales pattern.

Have I got it right so far? If you want to pursue this, I’d start “simple” with a toy model assuming you know the true stock available to the fruit stand, code up everything from that point forward in the sales process. And if that makes sense, relax the assumption of knowing the true stock by integrating over all potential stock configurations. I guess the latter will be very computationally expensive.

Because preferential buying is involved when the fruit stand stocks, you can’t use a multivariate hypergeometric distribution, as it allows no preference (it’s purely random). But it should be possible to generalise this by weighting certain types of fruits by means of a simplex vector from a dirichlet distribution. You’d have to figure out the PMF yourself though. I wouldn’t be surprised if it turns out to be very similar to a compound dirichlet-multinomial distribution.

I now also realise that this last step in the sales process (ie customers buying) does not have any parameters because you basically have perfect knowledge of the situation. The only role of the data on sales to customers is to inform the fruit stands patterns in stocking preferences, correct?

After a quick dive into Wikipedia: https://en.m.wikipedia.org/wiki/Noncentral_hypergeometric_distributions

This version of the hypergeometric distribution allows weighted sampling of marbles of difterent colors. What you need is the multivariate version of this.

1 Like

Update, it seems the “correct” distribution to use in this case is the Multivariate Wallenius’ noncentral hypergeometric distribution: https://en.wikipedia.org/wiki/Wallenius’_noncentral_hypergeometric_distribution#Software_available

Incorporating a _lpmf and _rng functions seems to be a bit complicated, but I will take a stab at the _lpmf by using numerical integration. Can someone point me to an example that uses numerical integration?

Otherwise, I can use a simple Simpson’s Method on a fixed grid or something.

1 Like
1 Like

Are you sure you don’t want to use the Fisher’s non-central etc. The wallenius would assume drawing of individual pieces of fruit.

I honestly don’t know. Is good intuition that Fisher’s is choosing everything at once and Wallenius choosing one at a time?

So for my toy problem it would be Fisher, but if 10 people came throughout the day and purchased fruit, then it would be Wallenius?

A good example showing the applications of the different distributions would be fantastic to build intuition.

Thanks!

I completely agree that consumer buying behaviour is closer to a Wallenius process (let’s call it that for now). However, I was thinking of the pattern of the fruit stall buying from the whole saler, as I understood you also wanted to model that. The latter process will be somewhere between a Wallenius and a Fisher process, as the fruit stall will for sure not buy fruits piece by piece, but more likely in one go, but after having looked at the availability of the different types of fruit (meaning that the draws for different fruit types are not completely independent). But yes, agreed, consumer behaviour would be much better described by a Wallenius distribution than a Fisher.

1 Like

Agreed. I think the Fisher is the direction I want to go. Also, it is interesting to think about the “aggregate” rollup of a day’s worth of sales. If you don’t have data for each sale, but only the inventory at the start and end of the day, then, as you said, this is somewhat in between the two non-central distributions.

Thanks for the feedback!