Let’s say we are following a cohort of people over time that will at some point have an event (e.g. death) and we think that follows a Weibull distribution. After some time, we look at the data and fit a Weibull model. What I’d like to do, is to predict for those that don’t have an event, yet, the predictive distribution based on this model for when they’ll have an event.
I think how I would theoretically do this manually, but I wondered whether I can easily get brms
to do this for me:
- Iterate through the draws from the posterior for the model parameters (either all of them, or fewer if you want fewer predicitons)
- Sample from the left-truncated Weibull distribution for each person (truncated at their current censoring time) using the same sampled value of the model parameters for all the patients.
That makes sense, right? But I’m not sure how one would get brms
to do that.
Here is how I would normally fit the model, below I will show one attempt I came up with to trick brms
into doing what I want.
library(tidyverse)
library(brms)
# n=number of subjects, shape=shape for event time, scale=scale for event times
simulate_data = function(n=1000, shape=0.9, scale=1/0.25){
tibble(
# Simulate true event times
event_times = rweibull(n=n, shape = shape, scale = scale),
) %>%
mutate(
# only administrative censoring with linear recruitment assumed
# i.e. first person has up to 1 year follow up, last person 1/n years
administrative_censoring = ( 1:n() ) / n(),
# Event only if not censored
event = 1L*(event_times < administrative_censoring),
# Create censoring indicator
censored = 1L-event,
# Time to first event or censoring
time = pmin(event_times, administrative_censoring)
)
}
set.seed(1234)
simulated_data = simulate_data()
brmfit1 = brm(data=simulated_data,
time | cens(censored) ~ 1,
family=weibull(),
backend = "cmdstanr")
My one attempt to trick brms into doing what I want is here. The idea I had was to add a left-truncation variable to the dataset (I called it at_risk_from
) and to set it initially to a really low number (I used 1e-9
- I guess we cannot use 0
). Then, once we predict, we set it to the old censoring time for each subject that needs a prediction. I guess this is a valid way of doing this? Is it a good way (or is there something more elegant)?
simulate_data = function(n=1000, shape=0.9, scale=1/0.25){
tibble(
# Simulate true event times
event_times = rweibull(n=n, shape = shape, scale = scale),
) %>%
mutate(
# only administrative censoring with linear recruitment assumed
# i.e. first person has up to 1 year follow up, last person 1/n years
administrative_censoring = ( 1:n() ) / n(),
# Event only if not censored
event = 1L*(event_times < administrative_censoring),
# Create censoring indicator
censored = 1L-event,
# Time to first event or censoring
time = pmin(event_times, administrative_censoring),
at_risk_from = 1e-9
)
}
set.seed(1234)
simulated_data = simulate_data()
brmfit1 = brm(data=simulated_data,
time | cens(censored) + trunc(lb=at_risk_from) ~ 1,
family=weibull(),
backend = "cmdstanr")
preds = posterior_predict(object=brmfit1,
newdata=simulated_data %>%
filter(censored==1) %>%
mutate( at_risk_from = time) %>%
dplyr::select(-time, -censored)
)
- Operating System: Windows 10
- brms Version: 2.17.0