Multinomial with missing data

I had a look at this topic but I could not figure out if applies directly to my issue.

I have a multinomial-like data (30 components, 20 observations, count range 0-1000), and some of those data points are missing (N~20).

Is there any approach I could use to model a multinomial with missing data?


I would like to point out the brms approach for discrete data

https://cran.r-project.org/web/packages/brms/vignettes/brms_missings.html

Your outcomes are discrete. Do it in the generated quantities block, ie predict multinomial outcomes with the _rng function according to the joint posterior of your probabilities?

Thanks Olli,

however I do not have a join posterior. The joint posterior is the thing I want to estimate.

I have a matrix of multinomial observations where there are some NAs in random spots.

I see – you don t want to remove an entire observation simply because one of the components in that observation is missing, right?

Is it a good idea to model the margin N (sum of components for one observation row) fixed?

If there is no strong rationale for that, does the following help,

Multinomial(y; N, pi ) * Poisson( N; lambda) = \prod_i Poisson(y_i; pi_i*lambda)

which should allow you to look at each data point, and just ignore those missing ones.

1 Like

This is not the case unfortunately, each observation can vary in their total counts

Sounds promising, if only I could understand what that is :)

Could you please be more explicit about what each term is in this case, and where are the missing/present observations, and why multinomial * poisson?

Yes – see the paper here :-)

So what I suggest is to model each of your observations as conditionally independent Poisson. It is almost the same as the Multinomial, and potentially you might not even want to consider the sum of your components fixed, given that you have missing values.

Thanks, very good source.

so how would a stan model would be according to

Hence, a multinomial distribution is equivalent to a collection of independent
Poisson distributions conditioned on their sum.

In particular hot to express

conditioned on their sum

I don’t think I can ignore the condition-by-their-sum because I don’t have so many component, they could be as few as 5. In this case the inverse relation would be too strong to ignore (using independent posisson), if this is what you mean.