Per-patient random effect as a poor man's missing data handler

martinmodrak · September 24, 2018, 11:56am

I am helping a friend run a metanalysis of relations between different mutations of the same protein complex and various phenotypes. Luckily, patient-level data is available for the studies included. The phenotypes are coded only as present/not present and not all studies included in the metanalysis have reported all the phenotypes of interest. Further some studies have not reported some predictors (e.g. age). I handle missingness in the phenotypes by converting the data to long form and have all effects depend on phenotype, which seems to work great. However, this setup does not allow me to handle missingness in age with the built-in approach as now multiple rows in the data correspond to the same patient and thus need to have the same estimate for the missing age. Being lazy and since it seems that age does not really have a strong effect, I want to avoid developing a full Stan model for this case, so I came with a workaround:

For every patient that is missing age data I add a per-patient random effect that should represent the age effect. To be specific: (code slightly edited for clarity, but basically this is how I run the model, age_std is standardized age)

data_age <- data %>% 
  mutate(age_mod = if_else(is.na(age_std), 0, age_std),
         guess_age = if_else(is.na(age_std), 1, 0),
         ID_for_age = factor(if_else(is.na(age_std),"No", as.character(ID)))
  ) 

formula_gene_source_age <- brmsformula(phenotype_value ~ 1 + <<some stuff>> +
  (0 + age_mod||phenotype) + (0 + guess_age || ID_for_age / phenotype) , family = "bernoulli")

Now when age is known, guess_age is zero and only the age_mod effect influences prediction. When age is unknown, age_mod is zero and guess_age is 1 and each patient gets his own estimate (nested for phenotype).

I am getting very similar inferences about the quantities of interest when handling age this way and when not including age as a predictor at all (but some pp checks look better with age included). So my conclusion is that modelling age has little benefit and I don’t want to spend time developing a correct missingness model.

Question is:

Do you see some way this “poor man’s missingness” can mislead me?
Is it possible to express this type of missingness in some better way?

Thanks for any ideas/hints/…

Operating System: Windows
brms Version: 2.5.0

paul.buerkner · September 25, 2018, 6:55pm

It’s hard for me to grasp the implications of your approach to be honest. In particular, I am not sure what the varying effect of guess_age caputers. Maybe I haven’t thought about it hard enough, yet.

A perhaps more straight forward approach could be to use multiple imputation for instance with mice and then use brm_multiple.

Topic		Replies	Views
NLME model in STAN for missing + unbalanced longitudinal data Modeling rstan , mixed-model , missing-data	20	468	October 21, 2024
Question regarding the handling of missing data in brms brms specification	3	714	June 1, 2021
Missing data of main effects in model with interaction terms brms missing-data	17	3106	October 4, 2022
Testing predictions of a hierarchical model against new group-level data General meta-analysis	1	848	December 14, 2020
Subject-specific vs population-averaged effects in brms longitudinal bernoulli regression model brms fitting-issues	17	1372	May 26, 2020

Per-patient random effect as a poor man's missing data handler

Related topics