@paul.buerkner I have a specific brms
question about using the missing data syntax to model missing time-invariant predictors in longitudinal data.
For a concrete example, the below simulated dataset has five subjects each with three observations, and x1
and x2
are time-invariant predictors. If I use the brms
missing data syntax to fit an example random intercept model:
b_formula <-
bf(y ~ x1 + mi(x2) + t + (1 | id)) +
bf(x2 | mi() ~ x1)
then the model will include num_subjects
* num_times
= 15 values of x1
and x2
in the missing data model bf(x2 | mi() ~ x1)
because it uses the long data format for the covariates. However, there are really only num_subjects
= 5 unique values to model because we assume they don’t change over time.
Is there a way to account for this in brms
?
I suppose it would require a different size data frame to be used in bf(y ~ x1 + mi(x2) + t + (1 | id))
and bf(x2 | mi() ~ x1)
.
Simulated data
num_subjects <- 5
num_times <- 3
# Create example longitudinal data
t <- rep(1:num_times, num_subjects)
id <- rep(c(1:num_subjects), each = num_times)
x1 <- rep(rnorm(n = num_subjects, mean = 0, sd = 1), each = num_times)
x2 <- rep(rnorm(n = num_subjects, mean = 0, sd = 1), each = num_times)
y <- rnorm(n = length(t), mean = 0, sd = 1)
df <- data.frame(id, t, x1, x2, y)
# Set the values of x2 for participant 1 to missing
df[df$id == 1,]$x2 <- NA
> print(df)
id t x1 x2 y
1 1 1 0.7643956 NA -0.92300226
2 1 2 0.7643956 NA 0.42074257
3 1 3 0.7643956 NA 0.06916378
4 2 1 0.7994848 0.7718577 -0.33242496
5 2 2 0.7994848 0.7718577 0.08503542
6 2 3 0.7994848 0.7718577 0.59117703
7 3 1 -0.3170920 -1.0295577 0.88898984
8 3 2 -0.3170920 -1.0295577 1.64393372
9 3 3 -0.3170920 -1.0295577 -1.06529484
10 4 1 0.3731299 -0.4514906 -0.07557917
11 4 2 0.3731299 -0.4514906 1.24197831
12 4 3 0.3731299 -0.4514906 1.10449441
13 5 1 2.1084877 -0.1122207 -1.93008626
14 5 2 2.1084877 -0.1122207 2.06173977
15 5 3 2.1084877 -0.1122207 2.05245378