Missing data imputation for outcome missing for all rows for certain timepoints

jd_c · June 15, 2021, 4:30pm

This might be a really dumb question. I have data that is taken at 4 timepoints, with predictors taken at all 4 timepoints (but some missing) and an outcome that was only taken at timepoint 1 and 4 (so completely missing at times 2 and 3).
Is there any point to fitting a model where I impute missing data for the outcome variable, since it is missing for every row at timepoints 2 and 3? I have information on the predictors at those timepoints. What I was thinking, was something like the following code in brms:

brm(bf(outcome | mi() ~ 1 + time + mi(predictor1) + mi(predictor2) + (1|id)) +
		bf(predictor1 | mi() ~ 1 + (1|id)) +
		bf(predictor2 | mi() ~ 1 + (1|id)),
	data = data, family = gaussian()

I have done something like this and compared to a complete case model, and the results are the same. I guess this is unsurprising, given that the outcome is missing for all rows for times 2 and 3. I would have thought it might have lowered the standard errors slightly…

Thoughts?

jsocolar · June 15, 2021, 4:36pm

The only point to fitting these data-free timepoints would be to use the fitting as a vehicle to get predictions for those timepoints. Everything you know about the response at those timepoints comes from the response at the timepoints for which you have data, so the imputed responses at the data-free timepoints can’t tell you anything additionally beyond what the measured responses can already tell you.

jd_c · June 15, 2021, 4:45pm

Thanks! That is what I was thinking as well after fitting both.

jd_c · June 15, 2021, 4:50pm

Just thinking a little more, I guess it could additionally have the benefit of better imputing missings for the predictors. Say for example, that a participant has values for predictor1 at timepoints 1,2,3 but missing at timepoint 4. Then if you include the data for timepoints 2 and 3, then you can better impute a value for timepoint 4 for predictor1, where the outcome also has a value. Correct, @jsocolar ? So in that respect, it looks like it might could help a little with the model, assuming that the predictors may have missings at timepoints 1 and 4.

jsocolar · June 15, 2021, 4:55pm

It depends on what information you’re willing to leverage to impute the missing predictors. If you’re willing to assume that predictors are drawn from some common distribution across timesteps, then in principle you can fit these distributions across timesteps, and with more data from more timesteps you could get better inference. On the other hand, if you’re just imputing the probable values of missing predictors based on the observed value of the response and the modeled relationships between the predictors and the response (i.e. you’re not incorporating any information about the probable values of the predictors except by conditioning on the response) then you don’t get any benefit.

jd_c · June 15, 2021, 5:00pm

Thanks. That makes sense. In the model, I additionally predict missings for the predictors like this:
predictor1|mi() ~ 1 + (1|id) and I could also do predictor1|mi() ~ 1 + time + (1 + time|id). So this would seem to be what you are describing when you say “fit these distributions across timesteps”.

Topic		Replies	Views
Data imputation in multilevel meta analysis brms brms meta-analysis , missing-data	4	927	June 25, 2020
Estimating missing response data in multivariate model brms	4	814	November 24, 2019
Predict.brms in multivariate model with imputation brms	2	1279	November 5, 2018
Missing data with brms mi() brms missing-data	2	828	June 5, 2020
Missing data in multiple correlated predictors brms	2	407	May 6, 2020

Missing data imputation for outcome missing for all rows for certain timepoints

Related topics