Missing data in multiple correlated predictors

A helpful vignette on modeling missing data in brms was posted recently: https://cran.r-project.org/web/packages/brms/vignettes/brms_missings.html

In that example, though, only one of the predictors (chl) has missing data. And Paul writes, “Since age contains no missing values, we only have to take special care of bmi and chl.”

Imagine, however, that there is missing data in both of the predictors, as in:

X1  X2
NA  10
12  13
25  22
11  NA
NA  NA
20  17

It is possible to imagine an imputation model that simultaneously uses X1 to impute X2 and uses X2 to impute X1. Conceptually, however, it is not clear to me what happens when both values are missing for an observation (as in the 5th row above).

What are the special considerations in this case?

For clarification, I imagine the brms syntax would be something like the following, though more experienced users may notice errors:

bform <- bf( Y | mi() ~ mi (X1) + mi (X2) +
  bf(X1 | mi() ~ X2) +
  bf(X2 | mi() ~ X1) +
  set_rescor (FALSE)
fit_imp <- brm (bform, data = dat)
1 Like

Since parameter estimation is simultaneous, so is estimation of missing values. That is imputing X1 and X2 at the same time using the other variable as predictor poses no problem. Please make sure though, that X1 and X2 are always wrapped in mi() whenever they appear on the right-hand side of a formula.

2 Likes