I just can’t wrap my head around how to properly do this with brms.
In the help documentation for mi()
we have the example:
data("nhanes", package = "mice")
N <- nrow(nhanes)
# ... #
# 'mi' terms can also be used when some responses are subsetted
nhanes$sub <- TRUE
nhanes$sub[1:2] <- FALSE
nhanes$id <- 1:N
nhanes$idx <- sample(3:N, N, TRUE)
# this requires the addition term 'index' being specified
# in the subsetted part of the model
bform3 <- bf(bmi | mi() ~ age * mi(chl, idx)) +
bf(chl | mi(se) + subset(sub) + index(id) ~ age) +
set_rescor(FALSE)
But the idx
column seems arbitrary to me. Why is it just a bunch of random numbers between 3 and 10? What does the model actually see?
I’m interested in this because I have a response variable (y_est
in reprex below) which I know is an underestimate of the real value, because for a small subset of individuals I have a much better estimate (“y_true
”), and because there is a prior expectation that the measurement method we use has a downward bias. However, for some of the individuals with y_true
measures I’m not interested in how it varies with our predictors of interest (they’re different but related species). So I’d like to use all the available information to impute (read “bias correct”) y_est
, and then use a subset of the imputed values to test for correlations with the x variables.
For good measure, I also have different groups for which I expect the corelations to vary. This isn’t necessary for the reprex but it’s closer to my actual data.
Here a reprex:
library(tidyverse)
set.seed(78)
N <- 100
b1_est_grpA <- -10
b1_est_grpB <- -7
b1_diff <- b1_est_grpA - b1_est_grpB
y_bias <- 5
n_exclude <- 5
n_true <- 10
group <- c(letters[1:n_exclude],
rep("grpA",floor((n_true-n_exclude)/2)),
rep("grpB",ceiling((n_true-n_exclude)/2)),
rep("grpA",floor((N-n_true)/2)),
rep("grpB",ceiling((N-n_true)/2)))
grpA_b0 <- 75
grpB_b0 <- 50
other_b0 <- mean(c(grpA_b0,grpB_b0))
b0_diff <- grpA_b0 - grpB_b0
is_grpA <- as.numeric(group == "grpA")
x <- runif(N)
y_est <- grpB_b0 + is_grpA*b0_diff + # varying intercept by group
(b1_est_grpB + is_grpA*b1_diff)*x + # varying slope by group
rnorm(N,0,1.5) # error
y_est[1:n_exclude] <- rnorm(n_exclude,other_b0,10) # change the excluded ones to some unrelated values
plot(y_est~x)
points(y_est[1:n_exclude] ~ x[1:n_exclude],col="red",pch=16)
# "true" y is available for some samples, with a bit of error (sigma = 10)
y_true <- c( (y_bias*y_est[1:n_true] + rnorm(n_true,0,10)),
rep(NA,N-n_true))
d <- data.frame(x=x,y_est=y_est,y_true=y_true,group=group)
d$is_ingroup<-d$group %in% c("grpA","grpB")
d$has_true<- !is.na(d$y_true)
head(d,15)
plot(y_true ~ y_est, data = d)
points(y_true ~ y_est, data = d[!d$is_ingroup,],col="red",pch=16)
cor.test(y_est,y_true)
## Model ----
## We want to
## 1. predict missing y_true based on y_est, then
## 2 model y_true ~ 1 + x + (1 + x | group) but
## *only* for the observations in the "ingroup"
library(brms)
library(cmdstanr)
set_cmdstan_path("~/.cmdstan/cmdstan-2.35.0")
# We can think of y_est as a function of y_true
# Can we use mi() and subset() to get the model to look at the data of interest?
d$idx <- 1:nrow(d)
m4 <- brm(
bf(y_est ~ mi(y_true,idx = idx)) +
bf(y_true | mi() + subset(is_ingroup) + index(idx) ~ x*group) +
set_rescor(FALSE),
d,
backend = "cmdstanr",
cores=4
)
# Above returns the error:
# Error: Could not match all indices in response 'ytrue'.
Thanks in advance!
- Operating System: Windows 11
- brms Version: 2.22.0