Using brms to model missing time-invariant predictors in longitudinal data

@paul.buerkner I have a specific brms question about using the missing data syntax to model missing time-invariant predictors in longitudinal data.

For a concrete example, the below simulated dataset has five subjects each with three observations, and x1 and x2 are time-invariant predictors. If I use the brms missing data syntax to fit an example random intercept model:

b_formula <-
  bf(y ~ x1 + mi(x2) + t + (1 | id)) +
  bf(x2 | mi() ~ x1)

then the model will include num_subjects * num_times = 15 values of x1 and x2 in the missing data model bf(x2 | mi() ~ x1) because it uses the long data format for the covariates. However, there are really only num_subjects = 5 unique values to model because we assume they don’t change over time.

Is there a way to account for this in brms?

I suppose it would require a different size data frame to be used in bf(y ~ x1 + mi(x2) + t + (1 | id)) and bf(x2 | mi() ~ x1).

Simulated data

num_subjects <- 5
num_times <- 3

# Create example longitudinal data
t <- rep(1:num_times, num_subjects)
id <- rep(c(1:num_subjects), each = num_times)
x1 <- rep(rnorm(n = num_subjects, mean = 0, sd = 1), each = num_times)
x2 <- rep(rnorm(n = num_subjects, mean = 0, sd = 1), each = num_times)
y <- rnorm(n = length(t), mean = 0, sd = 1)

df <- data.frame(id, t, x1, x2, y)

# Set the values of x2 for participant 1 to missing
df[df$id == 1,]$x2 <- NA
> print(df)
   id t         x1         x2           y
1   1 1  0.7643956         NA -0.92300226
2   1 2  0.7643956         NA  0.42074257
3   1 3  0.7643956         NA  0.06916378
4   2 1  0.7994848  0.7718577 -0.33242496
5   2 2  0.7994848  0.7718577  0.08503542
6   2 3  0.7994848  0.7718577  0.59117703
7   3 1 -0.3170920 -1.0295577  0.88898984
8   3 2 -0.3170920 -1.0295577  1.64393372
9   3 3 -0.3170920 -1.0295577 -1.06529484
10  4 1  0.3731299 -0.4514906 -0.07557917
11  4 2  0.3731299 -0.4514906  1.24197831
12  4 3  0.3731299 -0.4514906  1.10449441
13  5 1  2.1084877 -0.1122207 -1.93008626
14  5 2  2.1084877 -0.1122207  2.06173977
15  5 3  2.1084877 -0.1122207  2.05245378

Could the “idx” argument of mi terms be achieving what you want? You can then include a subset addition term in the x2 formula to ensure that x2 is really treated as just having 5 unique values. The subset syntax achieves essentially in one data.frame what would otherwise be achievable by muliple data sets. See also ?resp_subset and ?brmsformula.

This worked great, thank you!

Max, can you please post the model formula that you ended up using?

@joels Sorry just saw your message! Here’s what I used

brms_formula <-
  bf(y ~ mi(x_missing, idx = id) + t + z + (1 + t | id)) +
  bf(x_missing | mi() + subset(t1) + index(id) ~ z)

where y is the outcome variable, x_missing is the partially missing covariate, z is the fully observed covariate, t1 is an indicator (TRUE / FALSE) for the first observation for each subject.

3 Likes

No problem. Thanks for posting.