I am facing a missing data problem with unobserved Y’s and complete matrix of predictors X. I found this package MICE which is linked to brms. Is there any preferable methodology for this case? I’d like to impute values considering a multilevel structure and also including all the information from observed Y’s.
How about simply removing the rows with missing y and then design your model and infer the missing y values from that model? I would argue that is even better than using mice :) When using mice this paper could help you but I really don’t see a need for mice if you follow a principle Bayesian way:
@richard_mcelreath in Chapter 15(?) of his 2nd edition book Statistical Rethinking discusses the pros and cons of the approaches. In short, we used multiple imputation approaches when Bayesian imputation was infeasible because we didn’t have computation power.
Multiple imputation can be fully Bayesian. Chained equations (CE in MICE) doesn’t form proper joint distribution, so that can be considered not to be principled. Not all missing value cases are simple and we still often don’t have enough computation power. MICE is flexible for many missing value problems and scales well and the single models can be Bayesian models. Joint latent variable models would be alternative which would define proper joint distribution. The inference is a bit challenging, but they could be more popular. As Stan doesn’t allow discrete parameters, imputation with (Bayesian) MICE is very useful.
This is much easier case than missing values in X.
I recommend this, too, unless Y is multivariate and only some values for each row are missing. It can still be possible that the likelihood factorizes and you can just drop likelihood terms for the missing cases.
Yes, chained equations of MICE are not needed if only on variable has missing values.
I could remove the rows with missing y, but I wouldn’t be able to fit the multilevel model I want, cause for those missing values the combinations of X are unique. Thus, for missing y’s I have unobserved levels and I still want to model variation across clusters. I think that a mixture model, such as a Dirichlet process, would be reasonable for this case, although I am afraid of the coding part. Since MICE is also linked to brms, it would really facilitate me a lot that part.
My model has this structure:
y~ 1|x1 + 1|x2 + 1|x1:x2
The data looks like this:
Y X1 X2
10 A J
14 A K
NA A L
8 B J
NA B K
NA B L
NA C J
11 C K
12 C L
Aki, why are chained equations not needed? If unobserved Y depend on observed values, can’t you iterate it using imputed y’s, given a cluster specification?
In brms you can also do model imputation, meaning that at each iteration of the MCMC the model will estimate imputed values for your missing variables. The caveat is that it only works for continuous variables
My goal is to model variation of X2 across all the levels of X1. So I want to show that there might be heterogeneity in the variance of the levels of X2, which implies that Y is not uniformly higher or lower across X1 levels.
Maybe if I just run this model y~ (1|x1)+ (1|x2), I will not have problems with the imputation during model fitting. My question is if the specific variances for X2 levels capture the uncertainty of missingness.
Maybe I am missing something (and maybe you resolved your issue in the meantime), but what would be wrong about fitting only the data without missingness and using posterior_predict(fit, data = orig_data %>% filter(is.na(Y), allow_new_levels = TRUE) to estimate the uncertainty you have about the missing Y values? This just takes the fitted uncertainty in your factor levels and draws the coefficient for the combinations of predictors not seen in the non-missing data using this uncertainty. This needs to assume that the unobserved combinations are in some sense “from the same population” as the observed ones, so it won’t help you if there is systematic bias in the unobserved. But it might be a good start…
I am doing some missing data imputation in my response Y and a predictor as well but my data is organised in a hierarchical manner. Where each Y observation is nested within a temperature, for each individual for a given sampling session.
I followed the vignette on data imputation.
I wanted to know if imputation during fitting retains the model structure that I specify? And if I used MICE and did multiple imputation would that ‘break’ the hierarchical structure of my data?