Hints for Modeling a censored multinomial distribution?

Hi. I was wondering (hoping?) if anyone had come across a censored mutlinomial model? In particular, the issue is this:

For each timestep t, I have a vector y_t = (y^t_1,y^t_2,\ldots, y^t_K) where \sum\limits_{i=1}^K y^t_i = N_t. I know the individual y^t_i and N_t at each timestep. Furthermore, I also know that there are specific y^t_i that are censored. That is, the true value of \hat{y}^t_i >= y^t_i where the hat notation denotes “true.”

Has anyone ever encountered a model like this? Any hints would be most welcome!


Do you know the uncensored total at each time point?

Thanks for the reply. Yes, the uncensored total is still N_t. Basically if one of the y^t_i is too low, then it’s contribution would be “made up” by the other groups in a way that I would like to learn.

It is a nasty combinatorial problem, but in principle, you could write it down as a mixture model. Every multinomial array at time $$t$$ that is consistent with the uncensored total and the individual lower bounds gets a probability and log_sum_exp over that whole thing gives you a complete data likelihood. You may have billions of components in that mixture unless your lower bounds are pretty tight.

Oh that is a very interesting idea. I was hoping to marginalize out most of these combinations (leading to a multidimensional integral). I think I will pursue both options…

If you know the multinomial probabilities then you can model each observation using the observable bins. For example if y_{i} can take values in the first bin, the second bin, or somewhere within the third to fifth bins then you can model it as a categorical variable with probabilities \pi_{1}, \pi_{2}, and \pi_{3} + \pi_{4} + \pi_{5}, respectively. If multiple observations have the same censoring pattern then you can build a multinomial over the aggregated bins and their marginal probabilities for those observations.

Let me expand on the problem a bit: Assume we have M items a person could select from, each m_i are in a group M_j with \cup_{j=1}^K M_j = M That is, M is made up of disjoint subsets. Now, we are going to aggregate over a bunch of people selecting items and then add up how many are selected in each group. There are N total items selected, so we have a multinomial model over the K groups, but if an item m_i isn’t available for selection (i.e. we ran out if it), then a person picks another object. This non-availability of an item in a group M_j biases the total count in M_j low.

If I do the above for every day, then I get a timeseries of multinomials. My assumption is that how the probabilites \pi_i,\ldots,\pi_K change over time is stationary (say modeled by a parametric function).

If I say, just dropped those days where one of the groups had censored data, then I would let my model interpolate over this date (in generated_quantities) but the data is such that pretty much every day will have a group with censored data.

Now, I would like, not only for the model to infer what is happening with the current data, but what happens as we run out of items in each of the subsequent groups going forward…So, say we run out of items in group q, then we typically see an increase in group p, etc.
Does this help with clarity to the issue?

Hmm. Interesting!
You wrote:

Do they then pick an item in the same group, or in another group? I assume it’s in a different group, also because you write:

But do you then want to decipher how much the unobserved demand is? In this case more q than what was sold, and less p than what was sold.

This means that not only:

but that this non-availability biases the total count in some other (unknown) group high!

Last question: do they always pick an object in a different group? Or is it possible that they see that something is not available and then they don’t buy at all?

Asking because it’s a tiny bit related to my post in A collection of new posts on logit choice models.

Hey Uri, thanks for the response and questions! Basically, they will always choose/buy (under the conditions of the model) so something will be selected (which goes against the outside option model framework in logit formulations).

As for what they choose, that depends. Currently my generative model assumes that the exogenous variables are 1. time, 2. how much of each item, and 3. how “good” each item is (based on historical selection). So my assumptions are that, when there is a lot of good stuff, then the temporal component will dominate (i.e. this is the equilibrium behavior), but when the system has not enough stuff, or not enough good stuff, then different dynamics happen (the choosing of different items/groups). I would love for the model to learn these dynamics (along with the equilibrium behavior) if that makes sense.

I will also take a look at the post you linked! Thanks for sharing…

To keep the notation precise – what you want is not a censored multinomial distribution. A censored distribution is when you have a consistent set of probabilities but only observe some combination of bins – that’s straightforward to take into account by marginalizing probabilities over the bins as I suggested.

The main difficulty with your approach is that there’s no reason why the joint probabilities for N items will be the same for N - M items – consumers’ preferences can vary much change depending the available alternatives.

As Uri mentioned you want to take a multinomial logit approach here, but you have to be careful. The outside good is not necessary an actual alternative, rather can be just a mathematical tool to calibrate the individual probabilities relative to each other. In other words the outside good serves as an anchor against which all alternatives are judged.

More formally you can build a model that learns the latent parameters the determine the probability for an alternative relative to the outside good for any collection of alternatives, then build an appropriate multinomial distribution for that collection by softmax’ing the latent parameters. By modeling everything relative to that common outside good you maintain the interpretability of the softmax’ed latent parameters across time, which is what’s needed to well define your dynamics.

Okay that makes a lot of sense. As for the notation: if I know historically which items were not able to be chosen. How should I refer to that data? In particular, I chose censored because if I have N choosers and M items, if N > M, then I am calling those items censored on that day.

Is this consistent with generic censoring? (Maybe not w.r.t. multinomial, but in general).



Censoring implies a common latent distribution and observations that aggregate over multiple possible outcomes.

There’s no generic name for what you propose here – each observation comes from a different process due to the varying experimental configuration. “Varying experimental configurations” might be the most compact nomenclature.