Hey
I want to model data where repeated measurements where taken from animals.
I want to compare different treatment groups while correcting for animal specific effects.
The model I want to fit in BRMS looks something like this:
outcome ~ treatment_group + (1 | animal_id)
However, for some measurements the animal_id is missing. These animals have also been measured multiple times but I cannot connect the measured values through their animal_id.
So for this animals I cannot measure the animal specific effect.
I could remove the these measurements but I believe they still contain useful information for the data model.
I was wondering how to best model this in brms?
What I tried currently is to add a new unique ID for each measurement with missing ID. So all these measurements are assumed to be from a different animal (even if they are not).
Are there flaws with this approach?
Thanks in advance
1 Like
I have no idea how to include the observations with missing animal_id
in the model of the other data. Perhaps you could fit a version of the model with just those observations, and then use the parameter summaries to inform the priors for the model with the other observations which have animal_id
.
I think the approach of assigning a separate new animal_id to each unlabeled observation is reasonable in this context. I would probably do it the same way if I encountered such a situation. Relative to having all the labels, this approach will be a bit more conservative I assume but this is exactly what we want I guess to express the fact that we have some missing data.
As for sensitivity analysis, I would still also run a model with only the complete observations. Presumably, if there is not too much unlabeled data, results will be very similar.
Thanks a lot for your insights!
Yes, being conservative might be a good thing as this analysis will be used to inform on the design and sample size of a new similar study.