Survival predictions by categorical covariates in stan_jm

posterior_survfit as far as I understand generates survival predictions for each subject. In the current implementation of stan_jm, is there a way to predict for a subgroup, like treatment, sex, etc.? @sambrilleman

  • Operating System:
    R version 3.6.0 (2019-04-26)
    Platform: x86_64-apple-darwin15.6.0 (64-bit)
    Running under: macOS Mojave 10.14.3

  • rstanarm Version:
    rstanarm_2.18.2

Check out the ids argument to posterior_survfit. It allows you to select a subset of the individuals in the original data (or new data if new data is being used for the predictions). So you could do something like:

library(rstanarm)
f1 <- stan_jm(formulaLong = logBili ~ year + (1 | id), 
              dataLong = pbcLong,
              formulaEvent = Surv(futimeYears, death) ~ trt, 
              dataEvent = pbcSurv,
              time_var = "year",
              # this next line is only to keep the example small in size!
              chains = 1, cores = 1, seed = 12345, iter = 1000)
ps <- posterior_survfit(f1, ids = pbcSurv$id[pbcSurv$trt == 1])

Would that get you what you are after?

Thanks for the quick reply, @sambrilleman! I am trying to predict at the level of treatment, not at the level of an individual subject (like you would in a vanilla survival analysis), so I can make comparisons to observations (say K-M curves). Something like this:

image

Ok, yeah, I guess it depends on how you want to deal with the random effects.

In theory perhaps the simplest thing is to just marginalise over the random effects (i.e. take random draws from the random effects distribution unconditional on any biomarker data) and use those in the predictions of survival. Unfortunately, because I initially focused on dynamic predictions, I don’t think I’ve yet allowed these types of predictions for the event submodel (i.e. you can set dynamic = FALSE for posterior_traj but it is currently not allowed for posterior_survfit even though the argument is there and waiting to be implemented!) :-( … But in any case, the uncertainty in the biomarker trajectory without any biomarker data to inform it leads to huge uncertainty in the predictions for the event submodel because they are conditional on the entire biomarker history! (At least from memory that is what I have observed).

An alternative is to generate what I refer to as “standardised predictions”. You can do this within each treatment group and then compare these with observed survival in each treatment group. Standardised survival predictions are basically generating the individual-specific predictions and just averaging them (at each time point). This effectively averages across the random effects for those individuals used in the predictions, as well as averaging across the covariate distribution for those individuals used in the predictions. This is what the ps_check function does, but only for the entire sample. If you want to do a similar thing within each treatment group, then see the code below.

I discuss some of these things in the vignette… see this subsection and the one that follows it. I’d always be interested to hear thoughts on what approach makes the most sense.

library(rstanarm)
library(ggplot2)

f1 <- stan_jm(formulaLong = logBili ~ year + (1 | id), 
              dataLong = pbcLong,
              formulaEvent = Surv(futimeYears, death) ~ trt, 
              dataEvent = pbcSurv,
              time_var = "year",
              # this next line is only to keep the example small in size!
              chains = 1, cores = 1, seed = 12345, iter = 1000)

# Option 1) Predictions based on taking random draws 
#           from the random effects distribution
ndLong <- data.frame(id  = c("new1", "new2"))
ndSurv <- data.frame(id  = c("new1", "new2"), trt  = c(0,1))
psMarg <- posterior_survfit(f1, ndLong, ndSurv, dynamic = FALSE) # currently not allowed

# Option 2) Averaging across individual-specific predictions 
#           within each treatment group
idtrt0 <- pbcSurv$id[pbcSurv$trt == 0]
idtrt1 <- pbcSurv$id[pbcSurv$trt == 1]
pstrt0 <- posterior_survfit(f1, ids = idtrt0, standardise = TRUE, times = 0)
pstrt1 <- posterior_survfit(f1, ids = idtrt1, standardise = TRUE, times = 0)
  # --> compare with observed Kaplan-Meier
  kmfit  <- survival::survfit(Surv(futimeYears, death) ~ trt, pbcSurv)
  kmdat  <- data.frame(time  = kmfit$time, 
                       surv  = kmfit$surv, 
                       trt   = rep(0:1, kmfit$strata))
  kmdat0 <- kmdat[kmdat$trt == 0, ]
  kmdat1 <- kmdat[kmdat$trt == 1, ]
  plot0  <- plot(pstrt0) + geom_step(aes(x = time, y = surv), data = kmdat0)
  plot1  <- plot(pstrt1) + geom_step(aes(x = time, y = surv), data = kmdat1)
  bayesplot::bayesplot_grid(plot0, plot1, grid_args = list(ncol = 2))
1 Like

Yep, that’s what I was hoping to find :) The example code is great, thanks.
BTW, thanks for making stan_jm – great contribution and super useful!

@ericnovik I had a go at removing the stop on this today. I just pushed the changes to the branch I’m currently working on for survival models: feature/frailty-models. This might not be the most logical place to add these changes, but oh well. I wanted it on a live development branch since I thought it was better to try these marginalised survival predictions out a bit before trying to merge them into the master.

It’ll probably be a while before the changes make it to CRAN, so you’d have to install a development version of the package to use the functionality (and unfortunately installing from source can take forever, like ~15 minutes, and/or fail if you don’t have the right Makevars settings). Downloads available here for the source package and here for a Windows binary, in case that helps. If you’re on Windows then installing the binary should be fast. Unfortunately I can’t build a Mac binary on my machine and when I tried to do it using Rhub builder I got an error about compiler flags or something.

Hopefully this helps. If you manage to install this and try it out, then I’d be interested to hear how you go.

@sambrilleman I am working on stan_jm at the moment. Any developments on the implementation of marginalization of group specific parameters for predicting survival probabilities in CRAN?

I see the event model containing linear predictor values for each subject in —> model_object[[“survmod”]][[“mod”]][[“linear.predictors”]]. Can this be used as a workaround to determine survival probabilities for each subject say at 12 month mark? But these don’t seem like they are part of the posterior distribution else there will be predictor values for each of the draws in the sampling phase