Predicting future event times that are currently censored

Let’s say we are following a cohort of people over time that will at some point have an event (e.g. death) and we think that follows a Weibull distribution. After some time, we look at the data and fit a Weibull model. What I’d like to do, is to predict for those that don’t have an event, yet, the predictive distribution based on this model for when they’ll have an event.

I think how I would theoretically do this manually, but I wondered whether I can easily get brms to do this for me:

  1. Iterate through the draws from the posterior for the model parameters (either all of them, or fewer if you want fewer predicitons)
  2. Sample from the left-truncated Weibull distribution for each person (truncated at their current censoring time) using the same sampled value of the model parameters for all the patients.

That makes sense, right? But I’m not sure how one would get brms to do that.

Here is how I would normally fit the model, below I will show one attempt I came up with to trick brms into doing what I want.

library(tidyverse)
library(brms)

# n=number of subjects, shape=shape for event time, scale=scale for event times
simulate_data = function(n=1000, shape=0.9, scale=1/0.25){
  tibble(
    # Simulate true event times
    event_times = rweibull(n=n, shape = shape, scale = scale),
    ) %>%
    mutate(
      # only administrative censoring with linear recruitment assumed
      # i.e. first person has up to 1 year follow up, last person 1/n years
      administrative_censoring = ( 1:n() ) / n(), 
      # Event only if not censored
      event = 1L*(event_times < administrative_censoring),
      # Create censoring indicator
      censored = 1L-event,
      # Time to first event or censoring
      time = pmin(event_times, administrative_censoring)
    )
}

set.seed(1234)
simulated_data = simulate_data()

brmfit1 = brm(data=simulated_data, 
              time | cens(censored) ~ 1, 
              family=weibull(), 
              backend = "cmdstanr")

My one attempt to trick brms into doing what I want is here. The idea I had was to add a left-truncation variable to the dataset (I called it at_risk_from) and to set it initially to a really low number (I used 1e-9 - I guess we cannot use 0). Then, once we predict, we set it to the old censoring time for each subject that needs a prediction. I guess this is a valid way of doing this? Is it a good way (or is there something more elegant)?

simulate_data = function(n=1000, shape=0.9, scale=1/0.25){
  tibble(
    # Simulate true event times
    event_times = rweibull(n=n, shape = shape, scale = scale),
    ) %>%
    mutate(
      # only administrative censoring with linear recruitment assumed
      # i.e. first person has up to 1 year follow up, last person 1/n years
      administrative_censoring = ( 1:n() ) / n(), 
      # Event only if not censored
      event = 1L*(event_times < administrative_censoring),
      # Create censoring indicator
      censored = 1L-event,
      # Time to first event or censoring
      time = pmin(event_times, administrative_censoring),
      at_risk_from = 1e-9
    )
}

set.seed(1234)
simulated_data = simulate_data()

brmfit1 = brm(data=simulated_data, 
              time | cens(censored) + trunc(lb=at_risk_from) ~ 1, 
              family=weibull(), 
              backend = "cmdstanr")

preds = posterior_predict(object=brmfit1, 
                          newdata=simulated_data %>% 
                            filter(censored==1) %>%
                            mutate( at_risk_from = time) %>%
                            dplyr::select(-time, -censored)
                          )
  • Operating System: Windows 10
  • brms Version: 2.17.0

I am not sure how good of an idea it is to change the truncation value differently at prediction time because the model parameters were fit with some other bounds in mind. It may be valid, I just don’t know right now, hence my word of caution.

Alternatively, you could replace weibull with an exponential family and add an observation level (i.e., person level since we only have one observation per person) random effect. When this one is used in predictions (which happens automatically if you set up newdata right), it should hopefully ensure that most of the predictive mass for that person is above the truncation bound. It it not a hard bound of course, but it should come quite close to it I would hope.