Estimating missing response data in multivariate model

cbrown5 · November 13, 2019, 1:07pm

Hello,
I’m attempting to fit a multivariate model with brms, with the aim of estimating (imputing) missing data for one of the response variables.
My question is, will brms use information from non-missing response variables when estimating missing values with the predict function?

Here’s my model:

bf1mi <- bf(y1 ~ (1|ID|i))
bf2mi <- bf(y2|mi() ~ (1|ID|i))

m2 <- brm(bf1mi + bf2mi,  family = gaussian(), data = dat2mi)

Where variable y1 is complete and y2 has some missing (NA) entries. variable i is just an indicator for the data rows, included so we can model correlated responses with ID (I think I’ve got that right?).

As I see it there are two ways to get estimates of the missing data.
(1) We could fit the model to all the data and estimate the random intercepts. Then use predict to estimate the missing values of y. This seems to work well in my numerical tests.
(2) We could fit the model to just to the complete observations (ie rows with both y1 and y2), then estimate the missing values of y2 based on the estimated correlation with y1. I’m attempting to do this with predict.brmsfit() but I’m not sure it is doing as I’ve intended.

I’m on:

Windows 10
*brms 2.10.0

Thank you
Chris

reproducible code:

n <- 100
z <- rnorm(n, 0, 1)

dat2 <- data.frame(y1 = rnorm(n, 8*z),
                  y2 = rnorm(n, -10*z),
                  i = factor(1:n))
dat2mi <- dat2
dat2mi$y2[1:20] <- NA #make some missing entries 

bf1mi <- bf(y1 ~ (1|ID|i))
bf2mi <- bf(y2|mi() ~ (1|ID|i))

m2 <- brm(bf1mi + bf2mi, 
          family = gaussian(),
          data = dat2mi, chains = 2, cores = 2)

martinmodrak · November 17, 2019, 2:26pm

Hi,
on first glance, your model looks OK Also my first guess is that (1) would be generally preferable - although you won’t get any information about the correlation between y1 and y2 from the missing observations, you’ll get some information about the variability in y1. But I am no expert on this, so I’d rather check with @paul.buerkner for a second opinion.

What I am also not sure about, is whether using correlation is the best way to impute missing values. Maybe having something like y2 | mi() ~ y1 + 1||i could work better, but once again, I am not very sure about this.

Finally, you are using only two chains - is this because the model takes long to run? If so, are all the diagnostics OK?

Also, I’ve edited your post to better format source code (you can use triple backticks - ``` to do this)

Hope that helps

paul.buerkner · November 20, 2019, 2:20pm

I would agree that (1) seems preferable.

cbrown5 · November 24, 2019, 12:17pm

Cheers, thanks both.
@martinmodrak 2 chains just to get a quick result for the test, I’m using more for the real data.
I like the suggestion of y2 | mi() ~ y1 + 1||i that would be a SEM right? (if we also model y1 as a second response).
Thanks,
Chris

martinmodrak · November 24, 2019, 4:07pm

Not sure, I’ve always found those labels confusing and never managed to learn them right, so can’t help here, sorry :-)

Topic		Replies	Views
Missing data with brms mi() brms missing-data	2	811	June 5, 2020
Missing data in multiple correlated predictors brms	2	387	May 6, 2020
How to handle NA values in multivariate models? brms specification , missing-data	8	2705	June 19, 2025
Imputing missing responses in multivariate ordinal data Modeling ordinal-response , brms , missing-data	15	2378	May 1, 2023
Estimation of covariance over variables in a regression model brms	8	1263	May 4, 2019

Estimating missing response data in multivariate model

Related topics