Ok, yeah, I guess it depends on how you want to deal with the random effects.
In theory perhaps the simplest thing is to just marginalise over the random effects (i.e. take random draws from the random effects distribution unconditional on any biomarker data) and use those in the predictions of survival. Unfortunately, because I initially focused on dynamic predictions, I don’t think I’ve yet allowed these types of predictions for the event submodel (i.e. you can set dynamic = FALSE
for posterior_traj
but it is currently not allowed for posterior_survfit
even though the argument is there and waiting to be implemented!) :-( … But in any case, the uncertainty in the biomarker trajectory without any biomarker data to inform it leads to huge uncertainty in the predictions for the event submodel because they are conditional on the entire biomarker history! (At least from memory that is what I have observed).
An alternative is to generate what I refer to as “standardised predictions”. You can do this within each treatment group and then compare these with observed survival in each treatment group. Standardised survival predictions are basically generating the individual-specific predictions and just averaging them (at each time point). This effectively averages across the random effects for those individuals used in the predictions, as well as averaging across the covariate distribution for those individuals used in the predictions. This is what the ps_check
function does, but only for the entire sample. If you want to do a similar thing within each treatment group, then see the code below.
I discuss some of these things in the vignette… see this subsection and the one that follows it. I’d always be interested to hear thoughts on what approach makes the most sense.
library(rstanarm)
library(ggplot2)
f1 <- stan_jm(formulaLong = logBili ~ year + (1 | id),
dataLong = pbcLong,
formulaEvent = Surv(futimeYears, death) ~ trt,
dataEvent = pbcSurv,
time_var = "year",
# this next line is only to keep the example small in size!
chains = 1, cores = 1, seed = 12345, iter = 1000)
# Option 1) Predictions based on taking random draws
# from the random effects distribution
ndLong <- data.frame(id = c("new1", "new2"))
ndSurv <- data.frame(id = c("new1", "new2"), trt = c(0,1))
psMarg <- posterior_survfit(f1, ndLong, ndSurv, dynamic = FALSE) # currently not allowed
# Option 2) Averaging across individual-specific predictions
# within each treatment group
idtrt0 <- pbcSurv$id[pbcSurv$trt == 0]
idtrt1 <- pbcSurv$id[pbcSurv$trt == 1]
pstrt0 <- posterior_survfit(f1, ids = idtrt0, standardise = TRUE, times = 0)
pstrt1 <- posterior_survfit(f1, ids = idtrt1, standardise = TRUE, times = 0)
# --> compare with observed Kaplan-Meier
kmfit <- survival::survfit(Surv(futimeYears, death) ~ trt, pbcSurv)
kmdat <- data.frame(time = kmfit$time,
surv = kmfit$surv,
trt = rep(0:1, kmfit$strata))
kmdat0 <- kmdat[kmdat$trt == 0, ]
kmdat1 <- kmdat[kmdat$trt == 1, ]
plot0 <- plot(pstrt0) + geom_step(aes(x = time, y = surv), data = kmdat0)
plot1 <- plot(pstrt1) + geom_step(aes(x = time, y = surv), data = kmdat1)
bayesplot::bayesplot_grid(plot0, plot1, grid_args = list(ncol = 2))