Modeling missing data on outcomes in brms

Edit: because I think the way I reformulated this on github might be more clear?
Maybe I don’t understand how mi() works? I’m learning about measurement error models.
If x has missing data and w is a proxy, I can do bf(y ~ mi(x)) + bf(x | mi() ~ w) .
I’m trying to understand how comparable this model is to y ~ MeasError(x,w) using the R package mecore.

MeasErr(y,w) ~ x is a valid model. Can brms fit a comparable one?

Original Post Below

I am working on a modeling problem with missing data on the outcomes. My goal is to fit a bivariate regression.

    # simulate data
    N <- 100
    x <- rnorm(N, 0, 1)
    y <-  x + rnorm(N, 0, 0.5) + 1
    w <- y + rnorm(N, 0, 0.5)
    df <- data.table(x,y,w)

    if(m < N){
        df <- df[sample(nrow(df), 20), y.obs := y]
    } else {
        df <- df[, y.obs := y]

If missing data is on the RHS, the following model works great:

    model <- brm(formula=bf(x ~ mi(y.obs)) + bf(y.obs | mi() ~ w) + set_rescor(FALSE),data=df)

But if the missing data is on the LHS, I get an error in fitting the model: invalid type (list) for variable 'mi(y.obs)'

    model <- brm(formula=bf(y.obs | mi() ~ w) + bf(mi(y.obs) ~ x) + set_rescor(FALSE), data=df)

bf(y.obs | mi() ~ w) + bf(mi(y.obs) ~ w) + set_rescor(FALSE) outputs

y.obs | mi() ~ w 
mi(y.obs) ~ x

Which looks right to me, but I don’t understand why brms is having trouble.

  • Operating System: Rocky Linux release 8.4 (Green Obsidian)
  • brms Version:2.16.3

Thank you!

Howdy. You don’t define “mu_w” and “sd_w” in your simulation, so I couldn’t run it, but if you are simply trying to impute missing data for the outcome, then you simply write y.obs | mi() ~ w and that’s it. If you have a predictor that is also missing, say w, then you would write something like:

y.obs | mi() ~ mi(w)
w | mi() ~ 1

if you wanted to impute missings for both the outcome and predictor.

Thanks for the response!

Sorry about the missing variables. I’ve replaced them with constants.
I’m trying to impute missing data and fit a linear regression to the imputed data in the same model.

Yes, with y.obs | mi() ~ w you are imputing missing data in the outcome ‘on the fly’ as you fit your model… Maybe I am not understanding what you are trying to do.

Sorry what i’m doing isn’t super clear.
The scenario is:
I want to fit y ~ x but y has a missing rate. I can measure w which a reasonably good approximation of y. So I want to impute y using w and also estimate y ~ x.

If we swap x and y and x is the variable with missing data then bf(y ~ mi(x.obs)) + bf(x.obs | mi() ~ w) works great. So my question is really about how to fit regression models with missing data on the LHS.

I just realized a typo in the original question was a source of confusion. It’s corrected now.

I don’t know if brms can do this unless there is a non-linear formula sort of hack…

That’s unfortunate. I guess I’ll make a brms feature request.

Maybe ask @paul.buerkner to see if it is already possible. I just don’t know

You could use “overimputation” that is imputation while considering measurement error. If you know the measurement error of w with respect to the true (but unknown) y, you can do w | mi(se_w) ~ x to achieve your goal.

1 Like