Missing count data, use all possible combinations with fractional likelihood weights

Angelo_D_Ambrosio · January 13, 2023, 5:39pm

Hello, I have a nasty analysis where data is full of holes (not collected by me). I tried to fill up the holes algorithmically (a damn huge effort) but still, in some situations that’s not possible:

The above is an example of covid outbreak data in long-term care facilities, and my goal is to evaluate vaccine effectiveness. In the example, I have some missing that I cannot impute deterministically. We have a total of 11 asymptomatic cases whose vaccination status is unknown. Since I have the denominators, I can say that those 11 are either partially or fully vaccinated and that the partially ones can be a number between 0 and 2 (and consequently the fully ones are 9 to 11).

I was wondering if it’s legit to create 3 copies of the data, all weighted 1/3 at the likelihood level (the |weight() parameter in brms) for all possible legit imputations, with a random intercept at the outbreak level. The same would be done for all outbreaks with missing data in the dataset. Would that make sense?

I know that the correct approach would be repeating the analysis many times with random imputations and averaging but the number of possible imputation combinations is towards infinite.

Another possible approach is not to have uniform weights but have them coherent with the rest of the data (more complicated and more degrees of freedom in choosing how to do this)

ahartikainen · January 13, 2023, 6:50pm

What if you modeled your missing data with suitable distributions?

Angelo_D_Ambrosio · January 13, 2023, 10:54pm

Hi!

For a number of reasons:

Here we are speaking of discrete count imputation which I believe Stan cannot model. also the numbers are low, so I wasn’t sure how much bias could a continuous approximation entails.
Is the approach you suggested computationally as heavy as multiple imputation?
The imputed values need to follow a number of hard constraints on their values and sums which varies by outbreak.
But here’s the most important one: I’m definitely not versed in stan and I’m mostly a humble brms user, so I wouldn’t know how to manage such a complex data structure.

But maybe what I want to do is totally feasible in Stan and some tutorial could help me!

Angelo_D_Ambrosio · January 20, 2023, 10:41am

Anyway, aside from the specific problem, I’m also interested to know if the technique in general would make theoretical sense.
To summarise, the idea is to replicate observations with missing values substituting them with all possible values (or a sample of them) and then underweight their likelihood proportionally (1 over n. replicates) plus adding a random effect to account for the repeated measurements.

Topic		Replies	Views
Imputing cross conditional missing data in an "observation matrix" as input for further analysis Modeling brms , missing-data	0	112	March 29, 2024
Missing categorical group data Modeling brms , missing-data	1	533	January 22, 2023
Missing data of main effects in model with interaction terms brms missing-data	17	3103	October 4, 2022
Modeling missing discrete covariates in regression model? Modeling specification , discrete-parameters	8	127	February 10, 2025
Missing binomial data with more than 1 trial Modeling missing-data	1	1205	November 29, 2021

Missing count data, use all possible combinations with fractional likelihood weights

Related topics