Hi all, I’m working on an experiment in which a predictor variable was measured several times (with error) and an outcome variable was measured once.
What I would to do is estimate the “true” value of the predictor based on the noisy replicated measurements, and use that estimate to predict an outcome. So, a kind of two stage approach: 1) estimate the mean of predictor, 2) model relationship between estimated predictor mean and outcome. Ideally both regressions would be fit in the same Markov chain. I have attempted this in brms to no avail (I’m still a bit stan intimidated…).
Here is some representative simulated data:
library(brms)
library(tidyverse)
# simulate noisy x data
d <- tibble(
true_x = rnorm(200),
y = 0.5*true_x + rnorm(200),
# observed x values
x1 = true_x + rnorm(200, sd = 0.5),
x2 = true_x + rnorm(200, sd = 0.5),
x3 = true_x + rnorm(200, sd = 0.5),
x4 = true_x + rnorm(200, sd = 0.5),
x5 = true_x + rnorm(200, sd = 0.5))
d$ID <- as.factor(seq(1:nrow(d)))
I can estimate the means for each participant:
d_long <- pivot_longer(d, -c(ID, y, true_x),
names_to = "rep", values_to = "measured_x")
# model
m1 <- brm(
measured_x ~ 0 + ID,
data = d_long,
prior = prior(normal(0,1), class = "b"),
warmup = 1500, iter = 3000,
cores = 4, backend = "cmdstanr")
# save draws
post <- as_draws_df(m1) %>% select(starts_with("b_ID"))
But that isn’t particularly helpful, since I can’t (to my knowledge) use those posteriors in the next model stage.
An ideal solution would be to fit these two models simultaneously. But I’m not sure if this is possible given that each model requires a different dataset (d_long for model 1, d for model 2). Potentially there’s a solution with brms::subset() but I haven’t managed to figure it out.
Any help much appreciated!