Modeling unbalanced multivariate outcomes in brms

Hi, I’m trying to fit a basic unbalanced multivariate outcomes model in brms that I can then build upon. These data are from a behavioral paradigm (Stroop). The paradigm has two trial events/conditions: congruent and incongruent. However, I only examine responses to correct trials, which means that the data are unbalanced because people commit errors.

Here is an example of how the data might look. There are rows with measurements for one event (e.g., congruent) but not the other event (incongruent). (making up data for example…)

id: participant ID
trial: trial number… 1 = first trial, 2 = second trial, etc.
congruent: response for a congruent trial
incongruent: response for an incongruent trial

NA = empty

id trial congruent incongruent
100 1 300 350
100 2 310 345
100 3 305 NA
100 4 310 NA
101 1 300 350
101 2 310 345
101 3 NA 340
102 1 300 350
102 2 310 345
102 3 305 350
102 4 306 346

I have data on about a hundred participants and a few hundred trials of each condition for everyone.

If I had the same number of observations for congruent and incongruent trials, I could model the data using the following…

brm_fit <- brm(
  bf(congruent ~ 0 + (1+trial|p|id)) + 
    bf(incongruent ~ 0 + (1+trial|q|id)),
  data = df_wide,
  chains = 4,
  cores = 4,
  iter = 1000)

But, when I try to model the unbalanced data, brms throws out the rows with NA. How can I go about running this model with unbalanced observations?

Thank you very much for any insight!

  • Operating System: OS X 10.14.16
  • brms Version: 2.9.0

I’m not sure I understand how essential it is that the data are unbalanced, but you could always impute the missing values when you fit your model.

brm_fit <- brm(
  bf(congruent | mi() ~ 0 + (1+trial|p|id)) + 
    bf(incongruent | mi() ~ 0 + (1+trial|q|id)),
  data = df_wide,
  chains = 4,
  cores = 4,
  iter = 1000)

Thanks for your response. I thought of modeling the data as missing, but I’m not sure that is theoretically defensible. I’m not actually “missing” data, because I am using all correct trials. The responses definitely are not missing at random. It’s not like there is a random hardware malfunction for some trials or something similar.

At least that’s how I reasoned that I can’t model the responses as missing. I could be wrong though.

@paul.buerkner, I was wondering whether you had any thoughts? I apologize for the direct tag. I’m still banging my head against the wall, and I know this is probably a straightforward reply for you :) Can I even do this in brms?

Does anything speak against converting the data to long-format, which would allow to simply omit the trials with incorrect responses?

something like rt ~ 1 + (1 + trialtype/trial ) + (1 | id)

I realize this isn’t exactly what your original formula does, but I am not sure what you were grouping with |p| and |q|.

Thank you for your thoughts.

I am not opposed toward using a long-format data. But, I need to model the varying effects of trial type (congruent, incongruent) as correlated. I’m interested in the between-trialtype covariances. This is why I have been trying to use the multivariate model. I am not sure how to write a multivariate outcomes model using long format… I’ll keep looking!

I would recommend the following model, which uses the long format as @Guido_Biele suggested,
that is all RTs under each other and a new variable trialtype. With that, we could use

rt ~ 0 + trialtype + (1 | trial) + (0 + trialtype | id) 

that way you estimate both trial types in the same model while accounting for the dependencies
across the two observations per trial via a varying intercept.

Excellent. This worked.

Thank you everyone for your help! I appreciate it.