# Constraining Output of Multinomial Distribution

Yes this clarifies a lot. As expected, it doesn’t make things simpler haha.

From your explanation, I’m distilling the following unobserved variable: the true stock available for the wholesale buyer to buy. This poses a challenge: modelling an integer latent variable requirea integrating over all potential states of thentlatent variable, which is expensive if there are many latent states. And in this case, there are many many many, I imagine even for a modest total available stock N. In Stan you can’t integrate by sampling this latent state, but you have to considtr each potential state and assign some probability to it, either based on a prior or a more elaborate model. So that’s the first challenge.

The second challenge is that you then want to have a model for 1) the wholesales buyer’s / fruit stand salesman’s preference for buying fruit types, conditional on available stock, and 2) the customers’ preference conditional on what the wholesale buyer / fruit stand has on offer. Now, if I understand correctly, you know exactly what the wholesale buyer / fruit stand bought. So far so good.

Then third, you want to make inference about the wholesale buyer’s / fruit stand salesman’s preference or strategy for buying fruit types to depens on sales during the last two weeks. If I understand correctly, the actual (true) sales are also known. That would be great so as not to have also integrate over all possible states of some unknown sales pattern.

Have I got it right so far? If you want to pursue this, I’d start “simple” with a toy model assuming you know the true stock available to the fruit stand, code up everything from that point forward in the sales process. And if that makes sense, relax the assumption of knowing the true stock by integrating over all potential stock configurations. I guess the latter will be very computationally expensive.

Because preferential buying is involved when the fruit stand stocks, you can’t use a multivariate hypergeometric distribution, as it allows no preference (it’s purely random). But it should be possible to generalise this by weighting certain types of fruits by means of a simplex vector from a dirichlet distribution. You’d have to figure out the PMF yourself though. I wouldn’t be surprised if it turns out to be very similar to a compound dirichlet-multinomial distribution.

I now also realise that this last step in the sales process (ie customers buying) does not have any parameters because you basically have perfect knowledge of the situation. The only role of the data on sales to customers is to inform the fruit stands patterns in stocking preferences, correct?

After a quick dive into Wikipedia: https://en.m.wikipedia.org/wiki/Noncentral_hypergeometric_distributions

This version of the hypergeometric distribution allows weighted sampling of marbles of difterent colors. What you need is the multivariate version of this.

1 Like

Update, it seems the “correct” distribution to use in this case is the Multivariate Wallenius’ noncentral hypergeometric distribution: https://en.wikipedia.org/wiki/Wallenius’_noncentral_hypergeometric_distribution#Software_available

Incorporating a `_lpmf` and `_rng` functions seems to be a bit complicated, but I will take a stab at the `_lpmf` by using numerical integration. Can someone point me to an example that uses numerical integration?

Otherwise, I can use a simple Simpson’s Method on a fixed grid or something.

1 Like
1 Like

Are you sure you don’t want to use the Fisher’s non-central etc. The wallenius would assume drawing of individual pieces of fruit.

I honestly don’t know. Is good intuition that Fisher’s is choosing everything at once and Wallenius choosing one at a time?

So for my toy problem it would be Fisher, but if 10 people came throughout the day and purchased fruit, then it would be Wallenius?

A good example showing the applications of the different distributions would be fantastic to build intuition.

Thanks!

I completely agree that consumer buying behaviour is closer to a Wallenius process (let’s call it that for now). However, I was thinking of the pattern of the fruit stall buying from the whole saler, as I understood you also wanted to model that. The latter process will be somewhere between a Wallenius and a Fisher process, as the fruit stall will for sure not buy fruits piece by piece, but more likely in one go, but after having looked at the availability of the different types of fruit (meaning that the draws for different fruit types are not completely independent). But yes, agreed, consumer behaviour would be much better described by a Wallenius distribution than a Fisher.

1 Like

Agreed. I think the Fisher is the direction I want to go. Also, it is interesting to think about the “aggregate” rollup of a day’s worth of sales. If you don’t have data for each sale, but only the inventory at the start and end of the day, then, as you said, this is somewhat in between the two non-central distributions.

Thanks for the feedback!

So I have decided to try to do inference on the Multivariate Wallenius distribution. There is a great amount of work on this distribution by Agner Fog (https://www.agner.org/random/nchyp1.pdf).

Question: the integral in the `_pmf` is very numerically unstable and needs special treatment to compute properly. Agner has a C++ library at https://www.agner.org/random/ . Instead of me trying to implement this all myself, is there a good procedure to just call the C++ code directly from Stan?

I could copy paste everything into my `functions` block, but that is C++ not Stan.

Any advice would be greatly appreciated!

1 Like

https://cran.r-project.org/web/packages/rstan/vignettes/external.html

But the `integrate_1d` function in Stan / Boost is designed for difficult integrals of analytic functions. I would try that first.

@mathDR Did you get this working? I have a similar problem and am hoping you will share your solution.

Thanks.

Hey @twitched Unfortunately, no I didn’t. The likelihood was returning 0 (and log likelihood returning -infinty) for most/all inputs.

I am planning on revisiting this soon though and will update if I make any progress…

@MathDR Did you try to use the Agner Fog code or did you try to implement it using the `integrate_1d`?

I tried both. The former I didn’t really get to work (I tried adding it as a function to have Stan call it).

most of the work though was to get the `integrate_1d` to work. And I didn’t get that at all.

Thanks. If you find anything please post it. Or if you want to share your `integrate_1d` attempt I can attempt to see if I can get it working.

I’ll look at it again this weekend and post what I have

1 Like

This was my latest failed attempt:

``````functions {

real wallenius_pmf_integrand(real t, real xc, real[] theta, real[] x_r, int[] x_i)
{

int num_categories = size(theta);

vector[num_categories] y = to_vector(x_i[1:num_categories]);  // number of balls of each category that were drawn
vector[num_categories] m = to_vector(x_i[num_categories+1:]); // number of balls of each category in the urn
vector[num_categories] omega = to_vector(theta);  // weight of each category

real d = sum(omega .* (m-y));

real r = 1.0/d + 10.0;

real ret_val = 0.0;

for (i in 1:num_categories)
{
ret_val += y[i]*log1m( exp(r*omega[i]*log(t)) ); # Could reparameterize this to use exp1m
}
ret_val += log(r*d) + (r*d-1.0)*log(t);

return exp(ret_val);
}

real MWNCHG_lpmf(int[] y, vector omega, int[] m, real[] xr, int[] xi) {
// Multivariate Wallenius Non-Central Hypergeometric pmf
// x = items drawn in each category
// m = total number of items in each category
// omega = weights of each category

int num_categories = size(y);

real theta[num_categories] = to_array_1d(omega);
int one_array[num_categories] = rep_array(1,num_categories);

real first_part;
real second_part;

first_part = sum(
lgamma(to_vector(m)+to_vector(one_array)
) - lgamma(to_vector(y)+to_vector(one_array)
) - lgamma(to_vector(m)-to_vector(y)+to_vector(one_array))
);
second_part = integrate_1d(wallenius_pmf_integrand, 0., 1., theta, xr, xi, 0.01);
return  first_part + log(second_part);

}

}

data{
int<lower=1> num_obs; // number of observations
int<lower=1> num_categories; // number of categories
int<lower=0> selected[num_obs,num_categories]; // number selected  data
int<lower=0> reserve[num_categories]; // how many of each color
int<lower=0> x_i[num_obs,2*num_categories]; //input to distribution

int<lower=0, upper=1> run_estimation;
}

transformed data {
real x_r[0];
}

parameters {
simplex[num_categories] omega;  // the inferred weights of the Multivariate Wallenius Non-Central Hypergeometric Distribution
}

model{
omega ~ dirichlet(rep_vector(1.0/num_categories,num_categories));

if (run_estimation==1){
for(n in 1:num_obs){
selected[n] ~ MWNCHG(omega, reserve, x_r, x_i[n]);
}
}
}`````````

I generate some sampled data then try to learn `omega` i.e. the weights

Okay I got a working solution to the Wallenius non-central hypergeometric likelihood. The trick was to encode the integral (for now) with a left hand rule. So because we want the log of the pmf, we can use `log_sum_exp` and have a nice compact, numerically stable call.

Right now, I hardcoded the number of points and, as I said, am using a Left hand rule (the integral has a singularity at 1). I want to post my current solution to get anyone’s feedback (I will post my current `reduce_sum` solution – it isn’t much more complicated). Going forward, I would like to have a trapezoidal method, and have a variable number of points (where we can reuse previous function calls like in the tanh-sinh method). Lastly, I will implement choosing the optimal `r` for transforming the integral (ala eqn 19 from Agner Fog’s paper).

Anyway, here is the code:
functions {

``````  real approximate_wallenus_integral(vector omega, int[] y, int[] m)
{
int num_categories = size(y);
int num_points = 100;
vector[num_categories] vec_y = to_vector(y);  // number of balls of each category that were drawn
vector[num_categories] vec_m = to_vector(m); // number of balls of each category in the urn

real d = sum(omega .* (vec_m-vec_y));
real r = 33.0; // TODO: need to do this optimally
real dt = 1./num_points;

vector[num_points-1] ret_val;

for (i in 1:num_points-1)
{
ret_val[i] = (r*d-1.0)*log(i*dt) + sum(vec_y .* log1m_exp((r*omega)*log(i*dt))); // should do trapz
}
return (log(r) + log(d) + log_sum_exp(ret_val));
}

real MWNCHG_lpmf(int[] y, vector omega, int[] m) {
// Multivariate Wallenius Non-Central Hypergeometric pmf
// y = items drawn in each category
// m = total number of items in each category
// omega = weights of each category

int num_categories = size(y);

real theta[num_categories] = to_array_1d(omega);
int one_array[num_categories] = rep_array(1,num_categories);

real first_part;
real second_part;

first_part = sum(
lgamma(to_vector(m)+to_vector(one_array)
) - lgamma(to_vector(y)+to_vector(one_array)
) - lgamma(to_vector(m)-to_vector(y)+to_vector(one_array))
);
second_part = approximate_wallenus_integral(omega,y,m);
return  first_part + second_part;

}

real partial_sum(
int[ , ] selected_slice,
int start,
int end,
data int[ , ] incount,
vector omega
)
{
real ret_sum = 0;
for (n in start:end)
{
ret_sum += MWNCHG_lpmf(selected_slice[n] | omega, incount[n]);
}
return ret_sum;

}

}

data{
int<lower=1> num_obs; // number of observations
int<lower=1> num_categories; // number of categories
int<lower=0> selected[num_obs,num_categories]; // number selected  data
int<lower=0> incount[num_obs,num_categories]; // how many of each color

int<lower=0, upper=1> run_estimation;
}

parameters {
simplex[num_categories] omega;  // the inferred weights of the Multivariate Wallenius Non-Central Hypergeometric Distribution
}

model {
int grainsize = 1;
omega ~ dirichlet(rep_vector(1.0, num_categories));

if (run_estimation==1){
target += reduce_sum(partial_sum, selected, grainsize, incount, omega);
}
}
``````

I would love any suggestions!

4 Likes

linking a different discussion that has an analytical integral solution to the Multivariate Wallenius Noncentral Hypergeometric distribution

1 Like