Hints for Modeling a censored multinomial distribution?


Hi. I was wondering (hoping?) if anyone had come across a censored mutlinomial model? In particular, the issue is this:

For each timestep t, I have a vector y_t = (y^t_1,y^t_2,\ldots, y^t_K) where \sum\limits_{i=1}^K y^t_i = N_t. I know the individual y^t_i and N_t at each timestep. Furthermore, I also know that there are specific y^t_i that are censored. That is, the true value of \hat{y}^t_i >= y^t_i where the hat notation denotes “true.”

Has anyone ever encountered a model like this? Any hints would be most welcome!




Do you know the uncensored total at each time point?



Thanks for the reply. Yes, the uncensored total is still N_t. Basically if one of the y^t_i is too low, then it’s contribution would be “made up” by the other groups in a way that I would like to learn.



It is a nasty combinatorial problem, but in principle, you could write it down as a mixture model. Every multinomial array at time $$t$$ that is consistent with the uncensored total and the individual lower bounds gets a probability and log_sum_exp over that whole thing gives you a complete data likelihood. You may have billions of components in that mixture unless your lower bounds are pretty tight.



Oh that is a very interesting idea. I was hoping to marginalize out most of these combinations (leading to a multidimensional integral). I think I will pursue both options…