Predict.brms in multivariate model with imputation

wigglyhypersurface · November 3, 2018, 11:14pm

Please also provide the following information in addition to your question:

Operating System: Windows 10
brms Version: 2.6.0

I’m trying to use brms to get posterior predictions for multiple variables, some of which have missing values filled in with imputation during model fitting. However, the predict function is returning NaNs when I just run predict without specifying the response. It will return predictions for the first imputed variable, but any rows with NAs in the imputed variable get NAs in the predictions. Is this the expected behavior?

MWE:

y <- rnorm(100)
x <- ifelse(sample(c(0,1),size=100, replace = T, prob = c(.2,.8))==0, NA, rnorm(1))
z <- rnorm(100)

dat <- data.frame(x, y, z)

form1 <- bf(x | mi() ~ z)
form2 <- bf(y ~ mi(x))

mod <- brm(form1 + form2, data = dat)

newdat <- data.frame(x = ifelse(sample(c(0,1),size=100, replace = T, prob = c(.2,.8))==0, NA, rnorm(1)), z = rnorm(100))

predict(mod, newdata = newdat)

#There were 19 warnings (use warnings() to see them)

warnings()

Warning messages:
1: In rnorm(4000L, mean = c(NA_real_, NA_real_, NA_real_, … : NAs produced
2: In rnorm(4000L, mean = c(NA_real_, NA_real_, NA_real_, … : NAs produced
3: In rnorm(4000L, mean = c(NA_real_, NA_real_, NA_real_, … : NAs produced
4: In rnorm(4000L, mean = c(NA_real_, NA_real_, NA_real_, … : NAs produced
5: In rnorm(4000L, mean = c(NA_real_, NA_real_, NA_real_, … : NAs produced
6: In rnorm(4000L, mean = c(NA_real_, NA_real_, NA_real_, … : NAs produced
7: In rnorm(4000L, mean = c(NA_real_, NA_real_, NA_real_, … : NAs produced
8: In rnorm(4000L, mean = c(NA_real_, NA_real_, NA_real_, … : NAs produced
9: In rnorm(4000L, mean = c(NA_real_, NA_real_, NA_real_, … : NAs produced
10: In rnorm(4000L, mean = c(NA_real_, NA_real_, NA_real_, … : NAs produced
11: In rnorm(4000L, mean = c(NA_real_, NA_real_, NA_real_, … : NAs produced
12: In rnorm(4000L, mean = c(NA_real_, NA_real_, NA_real_, … : NAs produced
13: In rnorm(4000L, mean = c(NA_real_, NA_real_, NA_real_, … : NAs produced
14: In rnorm(4000L, mean = c(NA_real_, NA_real_, NA_real_, … : NAs produced
15: In rnorm(4000L, mean = c(NA_real_, NA_real_, NA_real_, … : NAs produced
16: In rnorm(4000L, mean = c(NA_real_, NA_real_, NA_real_, … : NAs produced
17: In rnorm(4000L, mean = c(NA_real_, NA_real_, NA_real_, … : NAs produced
18: In rnorm(4000L, mean = c(NA_real_, NA_real_, NA_real_, … : NAs produced
19: In rnorm(4000L, mean = c(NA_real_, NA_real_, NA_real_, … : NAs produced

paul.buerkner · November 5, 2018, 12:55pm

This behavior is intended although not ideal. During the sampling process, missing data are estimated in the form of additional parameters, but of course, these parameters can only be used for the original data not for new data. The current behavior is thus to just leave them NA.

For you example it is possible to predict x by z and then use the imputed values in x to predict y, but this only works if the “missing value graph” is acyclic that if it contains no circles (such as y ~ mi(x); x ~ mi(z); z ~ mi(y)).

As a result, before I implement this automatic imputation for new data, I have to add a function which checks for cycles.

Feel free to open an issue for all of this on https://github.com/paul-buerkner/brms

wigglyhypersurface · November 5, 2018, 10:05pm

That makes sense. Thanks! And thanks for brms and being very responsive.

Topic		Replies	Views
Estimating missing response data in multivariate model brms	4	775	November 24, 2019
Data imputation in multilevel meta analysis brms brms meta-analysis , missing-data	4	905	June 25, 2020
WAIC and predictions from multivariate model with monotonic effects and imputation brms	2	424	February 28, 2019
Needing help using a Predict Function in brms Modeling	5	438	September 5, 2019
Missing data imputation for outcome missing for all rows for certain timepoints brms techniques	5	540	June 15, 2021

Predict.brms in multivariate model with imputation

Related topics