Per-patient random effect as a poor man's missing data handler

I am helping a friend run a metanalysis of relations between different mutations of the same protein complex and various phenotypes. Luckily, patient-level data is available for the studies included. The phenotypes are coded only as present/not present and not all studies included in the metanalysis have reported all the phenotypes of interest. Further some studies have not reported some predictors (e.g. age). I handle missingness in the phenotypes by converting the data to long form and have all effects depend on phenotype, which seems to work great. However, this setup does not allow me to handle missingness in age with the built-in approach as now multiple rows in the data correspond to the same patient and thus need to have the same estimate for the missing age. Being lazy and since it seems that age does not really have a strong effect, I want to avoid developing a full Stan model for this case, so I came with a workaround:

For every patient that is missing age data I add a per-patient random effect that should represent the age effect. To be specific: (code slightly edited for clarity, but basically this is how I run the model, age_std is standardized age)

data_age <- data %>% 
  mutate(age_mod = if_else(is.na(age_std), 0, age_std),
         guess_age = if_else(is.na(age_std), 1, 0),
         ID_for_age = factor(if_else(is.na(age_std),"No", as.character(ID)))
  ) 

formula_gene_source_age <- brmsformula(phenotype_value ~ 1 + <<some stuff>> +
  (0 + age_mod||phenotype) + (0 + guess_age || ID_for_age / phenotype) , family = "bernoulli")

Now when age is known, guess_age is zero and only the age_mod effect influences prediction. When age is unknown, age_mod is zero and guess_age is 1 and each patient gets his own estimate (nested for phenotype).

I am getting very similar inferences about the quantities of interest when handling age this way and when not including age as a predictor at all (but some pp checks look better with age included). So my conclusion is that modelling age has little benefit and I don’t want to spend time developing a correct missingness model.

Question is:

  1. Do you see some way this “poor man’s missingness” can mislead me?
  2. Is it possible to express this type of missingness in some better way?

Thanks for any ideas/hints/…

  • Operating System: Windows
  • brms Version: 2.5.0

It’s hard for me to grasp the implications of your approach to be honest. In particular, I am not sure what the varying effect of guess_age caputers. Maybe I haven’t thought about it hard enough, yet.

A perhaps more straight forward approach could be to use multiple imputation for instance with mice and then use brm_multiple.

1 Like