MICE missing values on the response

Hi everyone,

I am facing a missing data problem with unobserved Y’s and complete matrix of predictors X. I found this package MICE which is linked to brms. Is there any preferable methodology for this case? I’d like to impute values considering a multilevel structure and also including all the information from observed Y’s.

Thank you!

How about simply removing the rows with missing y and then design your model and infer the missing y values from that model? I would argue that is even better than using mice :) When using mice this paper could help you but I really don’t see a need for mice if you follow a principle Bayesian way:

@richard_mcelreath in Chapter 15(?) of his 2nd edition book Statistical Rethinking discusses the pros and cons of the approaches. In short, we used multiple imputation approaches when Bayesian imputation was infeasible because we didn’t have computation power.

Multiple imputation can be fully Bayesian. Chained equations (CE in MICE) doesn’t form proper joint distribution, so that can be considered not to be principled. Not all missing value cases are simple and we still often don’t have enough computation power. MICE is flexible for many missing value problems and scales well and the single models can be Bayesian models. Joint latent variable models would be alternative which would define proper joint distribution. The inference is a bit challenging, but they could be more popular. As Stan doesn’t allow discrete parameters, imputation with (Bayesian) MICE is very useful.

This is much easier case than missing values in X.

I recommend this, too, unless Y is multivariate and only some values for each row are missing. It can still be possible that the likelihood factorizes and you can just drop likelihood terms for the missing cases.

Yes, chained equations of MICE are not needed if only on variable has missing values.

1 Like

I could remove the rows with missing y, but I wouldn’t be able to fit the multilevel model I want, cause for those missing values the combinations of X are unique. Thus, for missing y’s I have unobserved levels and I still want to model variation across clusters. I think that a mixture model, such as a Dirichlet process, would be reasonable for this case, although I am afraid of the coding part. Since MICE is also linked to brms, it would really facilitate me a lot that part.

My model has this structure:
y~ 1|x1 + 1|x2 + 1|x1:x2

The data looks like this:
Y X1 X2
10 A J
14 A K
NA A L
8 B J
NA B K
NA B L
NA C J
11 C K
12 C L

Aki, why are chained equations not needed? If unobserved Y depend on observed values, can’t you iterate it using imputed y’s, given a cluster specification?

In brms you can also do model imputation, meaning that at each iteration of the MCMC the model will estimate imputed values for your missing variables. The caveat is that it only works for continuous variables

You can see the " Imputation during model fitting" example from the missing data vignette from brms
https://cran.r-project.org/web/packages/brms/vignettes/brms_missings.html

1 Like

Because all X1 and X2 are observed you need just one “equation” for Y|X1,X2

If after removing missing Y’s the data doesn’t have information to identify your model Y|X1,X2, then you can’t impute Y.

This is an excellent vignette, and shows how you can give brms the data with NAs in Y. Howver, it doesn’t remove the identifiability problem.

I don’t see clusters here.

My goal is to model variation of X2 across all the levels of X1. So I want to show that there might be heterogeneity in the variance of the levels of X2, which implies that Y is not uniformly higher or lower across X1 levels.
Maybe if I just run this model y~ (1|x1)+ (1|x2), I will not have problems with the imputation during model fitting. My question is if the specific variances for X2 levels capture the uncertainty of missingness.

Maybe I am missing something (and maybe you resolved your issue in the meantime), but what would be wrong about fitting only the data without missingness and using posterior_predict(fit, data = orig_data %>% filter(is.na(Y), allow_new_levels = TRUE) to estimate the uncertainty you have about the missing Y values? This just takes the fitted uncertainty in your factor levels and draws the coefficient for the combinations of predictors not seen in the non-missing data using this uncertainty. This needs to assume that the unobserved combinations are in some sense “from the same population” as the observed ones, so it won’t help you if there is systematic bias in the unobserved. But it might be a good start…

Hope that helps!

2 Likes