Multinomial with missing data

stemangiola · February 6, 2021, 1:01am

I had a look at this topic but I could not figure out if applies directly to my issue.

I have a multinomial-like data (30 components, 20 observations, count range 0-1000), and some of those data points are missing (N~20).

Is there any approach I could use to model a multinomial with missing data?

I would like to point out the brms approach for discrete data

https://cran.r-project.org/web/packages/brms/vignettes/brms_missings.html

olli0601 · February 6, 2021, 12:06pm

Your outcomes are discrete. Do it in the generated quantities block, ie predict multinomial outcomes with the _rng function according to the joint posterior of your probabilities?

stemangiola · February 7, 2021, 12:07am

Thanks Olli,

however I do not have a join posterior. The joint posterior is the thing I want to estimate.

I have a matrix of multinomial observations where there are some NAs in random spots.

olli0601 · February 7, 2021, 12:27am

I see – you don t want to remove an entire observation simply because one of the components in that observation is missing, right?

Is it a good idea to model the margin N (sum of components for one observation row) fixed?

If there is no strong rationale for that, does the following help,

Multinomial(y; N, pi ) * Poisson( N; lambda) = \prod_i Poisson(y_i; pi_i*lambda)

which should allow you to look at each data point, and just ignore those missing ones.

stemangiola · February 7, 2021, 12:34am

This is not the case unfortunately, each observation can vary in their total counts

Sounds promising, if only I could understand what that is :)

Could you please be more explicit about what each term is in this case, and where are the missing/present observations, and why multinomial * poisson?

olli0601 · February 7, 2021, 1:21am

Yes – see the paper here :-)

So what I suggest is to model each of your observations as conditionally independent Poisson. It is almost the same as the Multinomial, and potentially you might not even want to consider the sum of your components fixed, given that you have missing values.

stemangiola · February 7, 2021, 2:50am

Thanks, very good source.

so how would a stan model would be according to

Hence, a multinomial distribution is equivalent to a collection of independent
Poisson distributions conditioned on their sum.

In particular hot to express

conditioned on their sum

I don’t think I can ignore the condition-by-their-sum because I don’t have so many component, they could be as few as 5. In this case the inverse relation would be too strong to ignore (using independent posisson), if this is what you mean.

Topic		Replies	Views
Impute partially missing discrete outcome Modeling specification	1	394	May 22, 2023
Model Poll data (Categorical Likelihood with Dirichlet Prior) Modeling rstan	1	328	April 21, 2023
Marginalize out missing continuous data Modeling techniques , specification	7	1170	February 25, 2023
Missing Data: Mixed Discrete-Continuous Gaussian Copula Modeling	12	1063	September 12, 2022
Marginalize missing binary outcome variable for GLM Modeling	11	1092	January 31, 2020

Multinomial with missing data

Related topics