This might be a really dumb question. I have data that is taken at 4 timepoints, with predictors taken at all 4 timepoints (but some missing) and an outcome that was only taken at timepoint 1 and 4 (so completely missing at times 2 and 3).
Is there any point to fitting a model where I impute missing data for the outcome variable, since it is missing for every row at timepoints 2 and 3? I have information on the predictors at those timepoints. What I was thinking, was something like the following code in brms:
brm(bf(outcome | mi() ~ 1 + time + mi(predictor1) + mi(predictor2) + (1|id)) +
bf(predictor1 | mi() ~ 1 + (1|id)) +
bf(predictor2 | mi() ~ 1 + (1|id)),
data = data, family = gaussian()
I have done something like this and compared to a complete case model, and the results are the same. I guess this is unsurprising, given that the outcome is missing for all rows for times 2 and 3. I would have thought it might have lowered the standard errors slightly…
The only point to fitting these data-free timepoints would be to use the fitting as a vehicle to get predictions for those timepoints. Everything you know about the response at those timepoints comes from the response at the timepoints for which you have data, so the imputed responses at the data-free timepoints can’t tell you anything additionally beyond what the measured responses can already tell you.
Thanks! That is what I was thinking as well after fitting both.
Just thinking a little more, I guess it could additionally have the benefit of better imputing missings for the predictors. Say for example, that a participant has values for predictor1 at timepoints 1,2,3 but missing at timepoint 4. Then if you include the data for timepoints 2 and 3, then you can better impute a value for timepoint 4 for predictor1, where the outcome also has a value. Correct, @jsocolar ? So in that respect, it looks like it might could help a little with the model, assuming that the predictors may have missings at timepoints 1 and 4.
It depends on what information you’re willing to leverage to impute the missing predictors. If you’re willing to assume that predictors are drawn from some common distribution across timesteps, then in principle you can fit these distributions across timesteps, and with more data from more timesteps you could get better inference. On the other hand, if you’re just imputing the probable values of missing predictors based on the observed value of the response and the modeled relationships between the predictors and the response (i.e. you’re not incorporating any information about the probable values of the predictors except by conditioning on the response) then you don’t get any benefit.
Thanks. That makes sense. In the model, I additionally predict missings for the predictors like this:
predictor1|mi() ~ 1 + (1|id) and I could also do
predictor1|mi() ~ 1 + time + (1 + time|id). So this would seem to be what you are describing when you say “fit these distributions across timesteps”.