Transformation of mi() in the main outcome model

MartyHofstra · May 14, 2021, 3:15pm

Hi there,

For my thesis I have to perform multiple imputation during model fitting, and compare the results to a model without multiple imputation during model fitting. Unfortunately, I cannot manage to linearly transform the mi() variable in the main outcome model. In short, I want to do the following:

Estimate the missing Troponin_Second_Measure values in a ‘sub-model’
Calculate (not estimate) the Troponin_Difference based on the Troponin_First_Measure (observed, this variable does not contain missing values) and Troponin_Second_Measure (observed + imputed) values.
Estimate the main outcome model component

Since multiple imputation during model fitting is a ‘one-step approach’, all above mentioned steps have to be performed at once. Currently, the two models (with and without multiple imputation during model fit) in R look as follows:

model1 <-
  quiet(
    brm(
      data = df,
      formula = Cardiac_Outcome ~ log(Troponin_First_Measure) + I(Troponin_Second_Measure-Troponin_First_Measure) + Physiological_Platec +  Angiogram  + Gender + Physiological_Creatine + (Troponin_First_Measure|ECG_Ischaemia),
      prior =  c(set_prior("student_t(3,0,5)"),
                set_prior("lkj(2)", class = "cor")),
      family = categorical(refcat = "Normal"),
      iter = 500, chains = 4, cores = 4,
      control = list(adapt_delta = 0.96, max_treedepth = 12)))

&

model2 <-
 quiet(
brm(
  data = df,
  formula = bf(Cardiac_Outcome ~ log(Troponin_First_Measure) + I(mi(Troponin_Second_Measure) - Troponin_First_Measure) + Physiological_Platec +  Angiogram  + Gender + Physiological_Creatine +  (Troponin_First_Measure|ECG_Ischaemia)) + 
             bf(Troponin_Second_Measure| mi() ~ Troponin_First_Measure + Physiological_Creatine +  Physiological_Platec + Gender) + set_rescor(FALSE),
  prior = c(set_prior("student_t(3,0,5)"),
            set_prior("lkj(2)", class = "cor")),
  family = list(categorical(refcat = "Normal"),
                gaussian()),
  iter = 500, chains = 4, cores = 4,
  control = list(adapt_delta = 0.96, max_treedepth = 12)))

Unfortunately, I get the following error when running the second model:

Error in stanc(file = file, model_code = model_code, model_name = model_name, : failed to parse Stan model ‘2f963b3c6056c1414f07e7c5ec2b7f85’ due to the above error

The subtraction

I(mi(Troponin_Second_Measure)-Troponin_First_Measure)

in the second model is not possible whereas the subtraction

I(Troponin_Second_Measure)-Troponin_First_Measure)

in the first model is possible. Hence, it is most likely that the error is caused by trying to perform a linear transformation on the mi() variable. Here it says that the formula doesn’t currently allow imputed variables to have non-linear effects in the main outcome model. However, is there any way of performing the subtraction mentioned above in either R or STAN? In other words, are linear transformations of the imputed variables possible?

Note: Calculating Troponin_Difference before fitting the model and imputing this variable leads to suboptimal results since fitting Troponin_Second_Measure in the ‘sub-model’ is more suitable because of the nature of the data. Namely, Troponin_Difference is harder to model based on domain knowledge/literature and it contains many zeroes.

Note 2: The I() function allows arithmetic operators in a formula in R.

Note 3: Unfortunately, I cannot share the data since it is confidential patient data. However, I have added the STAN code from a model where Troponin_Difference is calculated before fitting the model and then imputed (this model runs without any errors). See the R code and STAN code (see uploaded file) below. My guess is that in the model{} block something has to be adjusted in the lines under

// vector combining observed and missing responses

(see text highlighted in red). However, I’m not familiar with writing STAN code and I have no idea on how to proceed.

Thanks in advance!
Marty

model3 <-
quiet(
brm(
  data = df,
  formula = bf(Cardiac_Outcome ~ log(Troponin_First_Measure) + mi(Troponin_Difference) + Physiological_Platec +  Angiogram  + Gender + Physiological_Creatine +  (Troponin_First_Measure|ECG_Ischaemia)) + 
             bf(Troponin_Difference| mi() ~ Troponin_First_Measure + Physiological_Creatine +  Physiological_Platec + Gender) + set_rescor(FALSE),
  prior = c(set_prior("student_t(3,0,5)"),
            set_prior("lkj(2)", class = "cor")),
  family = list(categorical(refcat = "Normal"),
                gaussian()),
  iter = 500, chains = 4, cores = 4,
  control = list(adapt_delta = 0.96, max_treedepth = 12)))

model3.stan.txt (17.6 KB)

martinmodrak · May 21, 2021, 1:46pm

Hi, sorry for not getting to you earlier. It seems this is not currently supported and I don’t think this could be done easily - when using I(a - b) (with non-missing data), the subtraction is done when preprocessing the data, but that’s not an option for missing data. In fact, just taking a function (e.g. having I(mi(Troponin_Second_Measure) ^ 2)) works.

If you really need this, then you might be able to edit the raw Stan code generated by brms to make this happen (feel free to ask for direction on how to achieve this).

For posterity, I’ll also share an example to show this problem on example data:

So, using a function (works):

library(brms)
options(brms.backend = "cmdstanr")
data("nhanes", package = "mice")
bform <- bf(bmi | mi() ~ I(mi(chl)^2)) +
 bf(chl | mi() ~ age) + set_rescor(FALSE)
fit_imp2 <- brm(bform, data = nhanes)

Using data (does not work):

bform <- bf(bmi | mi() ~ I(mi(chl) - age)) +
 bf(chl | mi() ~ age) + set_rescor(FALSE)

fit_error <- brm(bform, data = nhanes)

Best of luck with your analysis!

martinmodrak · May 22, 2021, 3:33pm

Two things I forgot to note:

If you instead model your data with + mi(Troponin_Second_Measure) + Troponin_First_Measure , you get the desired model as a special case where b_Troponin_Second_Measure = -b_Troponin_First_Measure. So if you can tolerate the bit of additional flexibility and the two measures are not too strongly correlated, this could actually work quite nice.
You have adjusted adapt_delta and max_treedepth, presumably to address some divergences. It is quite likely those point at some problem with the model that if resolved could help you learn more from your data. We’d be happy to help you in diagnosing this, but we’d need to see some summaries from the model (e.g. pairs plots for suitable subsets of the parameters) and know more about the data.

Topic		Replies	Views
Missing data of main effects in model with interaction terms brms missing-data	17	3153	October 4, 2022
Brms: mi() for discrete outcomes in an IRT model brms specification , irt , missing-data	13	1937	June 18, 2021
Using mi() to generate same imputed value for all levels of a group brms	2	543	March 30, 2020
Losing observation when modelling missing values brms	9	652	April 11, 2019
Wrapping mi() inside a (centering) function brms	6	905	May 6, 2022

Transformation of mi() in the main outcome model

Related topics