I’m attempting to fit a multivariate model with brms, with the aim of estimating (imputing) missing data for one of the response variables.
My question is, will brms use information from non-missing response variables when estimating missing values with the predict function?
Here’s my model:
bf1mi <- bf(y1 ~ (1|ID|i)) bf2mi <- bf(y2|mi() ~ (1|ID|i)) m2 <- brm(bf1mi + bf2mi, family = gaussian(), data = dat2mi)
Where variable y1 is complete and y2 has some missing (NA) entries. variable i is just an indicator for the data rows, included so we can model correlated responses with ID (I think I’ve got that right?).
As I see it there are two ways to get estimates of the missing data.
(1) We could fit the model to all the data and estimate the random intercepts. Then use predict to estimate the missing values of y. This seems to work well in my numerical tests.
(2) We could fit the model to just to the complete observations (ie rows with both y1 and y2), then estimate the missing values of y2 based on the estimated correlation with y1. I’m attempting to do this with predict.brmsfit() but I’m not sure it is doing as I’ve intended.
- Windows 10
n <- 100 z <- rnorm(n, 0, 1) dat2 <- data.frame(y1 = rnorm(n, 8*z), y2 = rnorm(n, -10*z), i = factor(1:n)) dat2mi <- dat2 dat2mi$y2[1:20] <- NA #make some missing entries bf1mi <- bf(y1 ~ (1|ID|i)) bf2mi <- bf(y2|mi() ~ (1|ID|i)) m2 <- brm(bf1mi + bf2mi, family = gaussian(), data = dat2mi, chains = 2, cores = 2)