Hello, I have a nasty analysis where data is full of holes (not collected by me). I tried to fill up the holes algorithmically (a damn huge effort) but still, in some situations that’s not possible:

The above is an example of covid outbreak data in long-term care facilities, and my goal is to evaluate vaccine effectiveness. In the example, I have some missing that I cannot impute deterministically. We have a total of 11 asymptomatic cases whose vaccination status is unknown. Since I have the denominators, I can say that those 11 are either partially or fully vaccinated and that the partially ones can be a number between 0 and 2 (and consequently the fully ones are 9 to 11).

I was wondering if it’s legit to create 3 copies of the data, all weighted 1/3 at the likelihood level (the |weight() parameter in brms) for all possible legit imputations, with a random intercept at the outbreak level. The same would be done for all outbreaks with missing data in the dataset. Would that make sense?

I know that the correct approach would be repeating the analysis many times with random imputations and averaging but the number of possible imputation combinations is towards infinite.

Another possible approach is not to have uniform weights but have them coherent with the rest of the data (more complicated and more degrees of freedom in choosing how to do this)