Constraining Output of Multinomial Distribution

mathDR · September 8, 2019, 5:01pm

I have a general question regarding constraining the output of a multinomial regression.

Assume the model is as follows: we have 5 types of fruit, each having N_k, k = 1,\ldots,5 pieces to choose from. If I select N total pieces of fruit, I would like to model how many of each fruit I am going to choose.

For this, I am going to use a Multinomial distribution.

The issue is, at each “time-step” I can only choose a maximum of N_k of the k th type of fruit. Since I know N_k for each time-step, I would like to constrain the model such that the multinomial doesn’t return more of the bounded amount of fruit.

Is there a way of “clipping” the multinomial to do this? Assume any prior for it needed. That is, if a Dirichlet prior can have bounded output probabilities, then that would work. If a Generalized Dirichlet prior can bound each Beta distribution, then I’ll use that, etc.

Note that the number of fruit at each time-step is variable, but known when doing inference (so I cannot make a constrained input vector with hard bounds in Stan)

bgoodri · September 8, 2019, 6:12pm

I think you are going to have to truncate the multinomial distribution yourself. So the likelihood contribution of the k-th time-step is multinomial_lpdf(y[k] | theta) - log1m(sum_of_precluded_probabilities).

LucC · September 10, 2019, 6:48am

I’m not sure if I understood the application correctly, but if you want to “bound” the number of drawn fruits from each type because you know how many of each type is available, this is basically taking draws without replacement. The multinomial distribution assumes draws with replacement (like the binomial). The multivariate hypergeometric (https://en.m.wikipedia.org/wiki/Hypergeometric_distribution#Multivariate_hypergeometric_distribution) is maybe the thing you are looking for. Like the hypergeometric districtribution, it assumes drawing without replacement. Parameters are the number of available items to draw from, the number available for each type, and the number of items drawn. Those are all integers, so if any of those are unknown or only partly observed in the data, they would need to be integrated over in Stan.

mathDR · September 10, 2019, 4:53pm

Thanks! This is exactly what I needed!

LucC · September 10, 2019, 5:05pm

Perfect! Please let us Stanimals here know how it turned out!

mathDR · September 10, 2019, 5:44pm

Am I correct in assuming the multivariate hypergeometric distribution is not implemented in Stan?

bgoodri · September 10, 2019, 6:18pm

Yes

mathDR · September 11, 2019, 2:58am

After further investigation, the multivariate hypergeometric distribution isn’t really the solution.

In particular, for inference, it would attempt to learn the N_k values for each fruit. A quantity I already know. Furthermore, I know all of the values of the distribution, that is, I have known data for what was drawn, and I know the population of each of the categories.

One thing I thought of is that, since the dirichlet-multinomial is a conjugate prior to the multivariate hypergeometric distribution, then I could feasible learn the probabilities that induce my population counts at each timestep. But this isn’t really what I want either.

Thoughts?

LucC · September 11, 2019, 6:00am

To be able to help you: what is it exactly that you towant make inferences about? Recap of what you described

This can be addressed with Monte Carl simulation, as you have all the information. There are no free parameters.

But maybe you are thinking of something more specific (can’t tell from original post). It would therefore be useful if you would describe the architecture of the data and what you would like to learn from it. I.e. What exactly is observed in the data (and any potential hierarchical structure)?

mathDR · September 11, 2019, 3:36pm

For clarity, I will expound on the problem I am trying to solve (it is a toy problem to be sure, but has all of the components of the solution I desire).

So the problem is to forecast sales of fruit at a fruit stand. The fruit stand sells apples, pears, oranges, bananas and pineapples. Assume we have a wholesale buyer that will buy a block of fruit each day, say N items. What we do not know is how much of each type of fruit the buyer will purchase. The assumption is that there is a “static” probability that the buyer will use to choose fruit, but if there isn’t enough of that type to fulfill their demand, they will switch to another type of fruit, until they get their N pieces of fruit.

Our historical data is: the total amount of each fruit on each day X_i(t), i=1,\ldots,5. Also, we have the amount N(t) that the buyer purchased each day.

So, I would like to know, given current levels of fruit stock, and a forecasted N(t) how much of each type of fruit will be sold on a given day.

Some thoughts: if the amounts of fruit we “large enough” each day, then I could model the entire system as Multinomial, and compute the probability of selling fruit i on a given day: p_i(t). The issue is sometimes I don’t have enough of a type of fruit that the buyer would select if I had it, so I put some of the other fruit into the sale (since I have to sell exactly N(t) items on that day). So I would also like to learn the correlative structure of how different fruit sells as each type “runs out” – that is, if I am low on apples, the buyer selects more pears, etc. I assume that the way the selection process works is that there may be some seasonal properties, both to the probabilities p_i(t) but not to how the selection process changes when fruit runs out.

One complicating factor to this problem is that I place a restocking order every 2 weeks say, and I would like to use my forecast to plan what to buy.

Does this clarity the problem?

LucC · September 11, 2019, 5:47pm

Yes this clarifies a lot. As expected, it doesn’t make things simpler haha.

From your explanation, I’m distilling the following unobserved variable: the true stock available for the wholesale buyer to buy. This poses a challenge: modelling an integer latent variable requirea integrating over all potential states of thentlatent variable, which is expensive if there are many latent states. And in this case, there are many many many, I imagine even for a modest total available stock N. In Stan you can’t integrate by sampling this latent state, but you have to considtr each potential state and assign some probability to it, either based on a prior or a more elaborate model. So that’s the first challenge.

The second challenge is that you then want to have a model for 1) the wholesales buyer’s / fruit stand salesman’s preference for buying fruit types, conditional on available stock, and 2) the customers’ preference conditional on what the wholesale buyer / fruit stand has on offer. Now, if I understand correctly, you know exactly what the wholesale buyer / fruit stand bought. So far so good.

Then third, you want to make inference about the wholesale buyer’s / fruit stand salesman’s preference or strategy for buying fruit types to depens on sales during the last two weeks. If I understand correctly, the actual (true) sales are also known. That would be great so as not to have also integrate over all possible states of some unknown sales pattern.

Have I got it right so far? If you want to pursue this, I’d start “simple” with a toy model assuming you know the true stock available to the fruit stand, code up everything from that point forward in the sales process. And if that makes sense, relax the assumption of knowing the true stock by integrating over all potential stock configurations. I guess the latter will be very computationally expensive.

Because preferential buying is involved when the fruit stand stocks, you can’t use a multivariate hypergeometric distribution, as it allows no preference (it’s purely random). But it should be possible to generalise this by weighting certain types of fruits by means of a simplex vector from a dirichlet distribution. You’d have to figure out the PMF yourself though. I wouldn’t be surprised if it turns out to be very similar to a compound dirichlet-multinomial distribution.

I now also realise that this last step in the sales process (ie customers buying) does not have any parameters because you basically have perfect knowledge of the situation. The only role of the data on sales to customers is to inform the fruit stands patterns in stocking preferences, correct?

LucC · September 11, 2019, 6:00pm

After a quick dive into Wikipedia: Noncentral hypergeometric distributions - Wikipedia

This version of the hypergeometric distribution allows weighted sampling of marbles of difterent colors. What you need is the multivariate version of this.

mathDR · September 15, 2019, 9:56pm

Update, it seems the “correct” distribution to use in this case is the Multivariate Wallenius’ noncentral hypergeometric distribution: https://en.wikipedia.org/wiki/Wallenius’_noncentral_hypergeometric_distribution#Software_available

Incorporating a _lpmf and _rng functions seems to be a bit complicated, but I will take a stab at the _lpmf by using numerical integration. Can someone point me to an example that uses numerical integration?

Otherwise, I can use a simple Simpson’s Method on a fixed grid or something.

bgoodri · September 15, 2019, 11:06pm

github.com

stan-dev/stan/blob/fa03aece01b28accdbb7dc0c86adcedd9dd39618/src/test/test-models/good/integrate_1d_good.stan#L21


  real theta_d[3];
  real int_foo1 = integrate_1d(foo, 0.2, 1.3, theta_d, x_r, x_i, 0.01);
}
parameters {
  real lb;
  real ub;
  real theta[3];
}
model {
  // all eight instantiations of (lb, ub, theta);
  real int_foo2 = integrate_1d(foo, 0.2, 1.3, theta, x_r, x_i, 0.01);
  real int_foo3 = integrate_1d(foo, lb, 1.3, theta, x_r, x_i, 0.01);
  real int_foo4 = integrate_1d(foo, 0.2, ub, theta, x_r, x_i, 0.01);
  real int_foo5 = integrate_1d(foo, lb, ub, theta, x_r, x_i, 0.01);


  // redundant test for int_foo6 given transformed data test
  real int_foo6 = integrate_1d(foo, 0.2, 1.3, theta_d, x_r, x_i, 0.01);
  real int_foo7 = integrate_1d(foo, lb, 1.3, theta_d, x_r, x_i, 0.01);
  real int_foo8 = integrate_1d(foo, 0.2, ub, theta_d, x_r, x_i, 0.01);
  real int_foo9 = integrate_1d(foo, lb, ub, theta_d, x_r, x_i, 0.01);
}

LucC · September 16, 2019, 11:03am

Are you sure you don’t want to use the Fisher’s non-central etc. The wallenius would assume drawing of individual pieces of fruit.

mathDR · September 16, 2019, 2:36pm

I honestly don’t know. Is good intuition that Fisher’s is choosing everything at once and Wallenius choosing one at a time?

So for my toy problem it would be Fisher, but if 10 people came throughout the day and purchased fruit, then it would be Wallenius?

A good example showing the applications of the different distributions would be fantastic to build intuition.

Thanks!

LucC · September 16, 2019, 4:04pm

I completely agree that consumer buying behaviour is closer to a Wallenius process (let’s call it that for now). However, I was thinking of the pattern of the fruit stall buying from the whole saler, as I understood you also wanted to model that. The latter process will be somewhere between a Wallenius and a Fisher process, as the fruit stall will for sure not buy fruits piece by piece, but more likely in one go, but after having looked at the availability of the different types of fruit (meaning that the draws for different fruit types are not completely independent). But yes, agreed, consumer behaviour would be much better described by a Wallenius distribution than a Fisher.

mathDR · September 16, 2019, 4:40pm

Agreed. I think the Fisher is the direction I want to go. Also, it is interesting to think about the “aggregate” rollup of a day’s worth of sales. If you don’t have data for each sale, but only the inventory at the start and end of the day, then, as you said, this is somewhat in between the two non-central distributions.

Thanks for the feedback!

mathDR · October 3, 2019, 2:24am

So I have decided to try to do inference on the Multivariate Wallenius distribution. There is a great amount of work on this distribution by Agner Fog (https://www.agner.org/random/nchyp1.pdf).

Question: the integral in the _pmf is very numerically unstable and needs special treatment to compute properly. Agner has a C++ library at https://www.agner.org/random/ . Instead of me trying to implement this all myself, is there a good procedure to just call the C++ code directly from Stan?

I could copy paste everything into my functions block, but that is C++ not Stan.

Any advice would be greatly appreciated!

bgoodri · October 3, 2019, 3:51am

https://cran.r-project.org/web/packages/rstan/vignettes/external.html

But the integrate_1d function in Stan / Boost is designed for difficult integrals of analytic functions. I would try that first.

Topic		Replies	Views
Biased urn problem & Wallenius' noncentral hypergeometric distribution General	23	1783	December 13, 2023
Generalized Dirichlet Distribution as a prior Modeling	17	2730	July 2, 2025
Trying to generalise the dirichlet-multinomial (non-analytical) framework (replacing Dirichlet with other distributions) Modeling techniques , bioinformatics , dirichlet-multinomial	23	2650	November 5, 2018
Two parameter distribution over the simplex Modeling	23	2854	April 13, 2018
Dirichlet_log: probabilities is not a valid simplex Modeling	26	3624	September 27, 2017

Constraining Output of Multinomial Distribution

Related topics