Survival analysis with right censored and interval censored data with brms

Hi,

I’m working on a project that aims to assess the survival of oak seedlings under different conditions.

We planted a few hundred acorns under different types of cover (shrub cover, tree cover, open areas etc.). Many acorns never germinated, but some grew into seedlings. We visited the plots periodically to check if the seedlings were still alive and recorded the number of days the seedlings survived. After a while, we ran out of funding, so we had to visit the sites less often than we would have liked. This means the visits were further and further apart towards the end of the study.

I believe the dataset has two types of censoring, interval censoring and right censoring.

Interval censoring - when we found a seedling alive at “t1” but dead at “t2”, we must assume that it died at some point between “t1” and “t2”.

Right censoring - because some seedlings managed to survive until the end of the study

I’m trying to run the analysis with “brms”, but I’m struggling with the syntax.

This is what the dataset looks like:

days_to_event_alive - number of of days between germination and the last time the seedling was found alive.

days_to_event_death - number of of days between germination and the time the plant was found dead.

Event - cause of death

cens_int - my attempt at creating a censoring column

I wrote the following brms code

fit = brm(days_to_event_alive|cens(cens_int,days_to_event_death) ~ 0+cover_type, data = df,
          prior(normal(0, 10), class = b),
          family = exponential,chains = 4, cores = 4)

and I have the following questions:

  • are my data interval censored and right censored?

  • did I assign the right codes to each type of censoring?

  • is the brms code right? It runs with the full dataset, but it returns “Log probability evaluates to log(0), i.e. negative infinity.” I guess it’s either because the model is not parameterized correctly or I need to set adequate priors. I’m not sure which.

Thanks in the advance for helping me.

Example data:

df<-structure(list(days_to_event = c(1069, 345, 1003, 309, 375, 246, 
1033, 345, 1003, 282, 411, 73, 67, 967, 543, 1033, 1033, 723, 
279, 1033, 753, 348, 1069, 1069, 345, 177, 1033, 164, 411, 411, 
128, 1069, 164, 348, 309, 1033, 753, 1033, 753, 753, 411, 375, 
243, 279, 1033, 213, 411, 723, 375, 513, 543, 1033, 375, 194, 
309, 723, 1003, 579, 1033, 348, 651, 210, 753, 375, 967, 579, 
477, 789, 1069, 375, 279, 1033, 309, 1033, 312, 103, 375, 345, 
753, 213, 1033, 789, 312, 1069, 279, 1033, 1033, 348, 177, 375, 
279, 279, 282, 753, 375, 309, 309, 753, 753, 411), days_to_event_alive = c(1069, 
282, 1003, 246, 312, 177, 1033, 282, 1003, 213, 348, 37, 31, 
687, 375, 753, 753, 513, 230, 1033, 543, 279, 1069, 1069, 282, 
128, 753, 129, 348, 348, 93, 789, 129, 279, 246, 1033, 543, 1033, 
543, 543, 348, 312, 194, 230, 1033, 164, 348, 513, 312, 345, 
375, 1033, 312, 159, 246, 513, 723, 411, 1033, 279, 441, 141, 
543, 312, 967, 411, 309, 579, 1069, 312, 230, 1033, 246, 1033, 
243, 67, 312, 282, 543, 164, 1033, 579, 243, 789, 230, 1033, 
1033, 279, 128, 312, 230, 230, 213, 543, 312, 246, 246, 543, 
543, 348), days_to_event_death = c(1069, 345, 1003, 309, 375, 
246, 1033, 345, 1003, 282, 411, 73, 67, 967, 543, 1033, 1033, 
723, 279, 1033, 753, 348, 1069, 1069, 345, 177, 1033, 164, 411, 
411, 128, 1069, 164, 348, 309, 1033, 753, 1033, 753, 753, 411, 
375, 243, 279, 1033, 213, 411, 723, 375, 513, 543, 1033, 375, 
194, 309, 723, 1003, 579, 1033, 348, 651, 210, 753, 375, 967, 
579, 477, 789, 1069, 375, 279, 1033, 309, 1033, 312, 103, 375, 
345, 753, 213, 1033, 789, 312, 1069, 279, 1033, 1033, 348, 177, 
375, 279, 279, 282, 753, 375, 309, 309, 753, 753, 411), cover_type = c("open", 
"open", "diverse", "salvi", "open", "open", "open", "open", "open", 
"open", "open", "salvi", "open", "ulex", "open", "open", "open", 
"salvi", "open", "diverse", "open", "open", "diverse", "diverse", 
"ulex", "open", "esteva", "open", "open", "open", "open", "diverse", 
"diverse", "open", "ulex", "open", "salvi", "open", "esteva", 
"salvi", "salvi", "salvi", "open", "open", "diverse", "open", 
"open", "open", "salvi", "open", "open", "open", "open", "open", 
"salvi", "open", "esteva", "open", "open", "diverse", "salvi", 
"diverse", "salvi", "open", "open", "diverse", "open", "open", 
"open", "open", "salvi", "open", "open", "diverse", "open", "salvi", 
"salvi", "open", "ulex", "salvi", "salvi", "diverse", "open", 
"open", "diverse", "salvi", "open", "open", "open", "open", "diverse", 
"open", "salvi", "open", "open", "open", "open", "open", "open", 
"open"), Event = c("End of study", "Dried", "End of study", "Dried", 
"Dried", "Dried", "End of study", "Dried", "End of study", "Dried", 
"Dried", "Unknown", "Unknown", "Unknown", "Dried", "Dried", "Dried", 
"Dried", "Dried", "End of study", "Dried", "Dried", "End of study", 
"End of study", "Unknown", "Dried", "Unknown", "Dried", "Dried", 
"Dried", "Dried", "Dried", "Dried", "Dried", "Unknown", "End of study", 
"Dried", "End of study", "Dried", "Dried", "Dried", "Dried", 
"Dried", "Dried", "End of study", "Dried", "Dried", "Dried", 
"Dried", "Dried", "Unknown", "End of study", "Dried", "Dried", 
"Dried", "Dried", "Unknown", "Dried", "End of study", "Dried", 
"Dried", "Dried", "Dried", "Dried", "End of study", "Unknown", 
"Dried", "Dried", "End of study", "Unknown", "Dried", "End of study", 
"Dried", "End of study", "Dried", "Unknown", "Dried", "Dried", 
"Dried", "Dried", "End of study", "Dried", "Dried", "Unknown", 
"Dried", "End of study", "End of study", "Dried", "Destroyed by wildboars", 
"Dried", "Dried", "Dried", "Dried", "Dried", "Dried", "Dried", 
"Dried", "Dried", "Dried", "Dried"), cens_int = c("right", "interval", 
"right", "interval", "interval", "interval", "right", "interval", 
"right", "interval", "interval", "interval", "interval", "interval", 
"interval", "interval", "interval", "interval", "interval", "right", 
"interval", "interval", "right", "right", "interval", "interval", 
"interval", "interval", "interval", "interval", "interval", "interval", 
"interval", "interval", "interval", "right", "interval", "right", 
"interval", "interval", "interval", "interval", "interval", "interval", 
"right", "interval", "interval", "interval", "interval", "interval", 
"interval", "right", "interval", "interval", "interval", "interval", 
"interval", "interval", "right", "interval", "interval", "interval", 
"interval", "interval", "right", "interval", "interval", "interval", 
"right", "interval", "interval", "right", "interval", "right", 
"interval", "interval", "interval", "interval", "interval", "interval", 
"right", "interval", "interval", "interval", "interval", "right", 
"right", "interval", "interval", "interval", "interval", "interval", 
"interval", "interval", "interval", "interval", "interval", "interval", 
"interval", "interval")), row.names = c(NA, -100L), class = c("tbl_df", 
"tbl", "data.frame"))
1 Like

The answer is yes to all questions. The initialization problem does not tend to be a problem if it runs without any bad diagnostics. The sampler just have a little problem finding valid starting points, but finds them quite fast anyway.

2 Likes

Hi!

I have a very similar question as the one from fsdias, but I am not a 100% sure I am applying the solution from @StaffanBetner the right way. Also, I would like to use this survival model in a different way.

I am interested in predicting psychopathology development over time using survival parameters from the survival model.
I have data collected using ESM in daily life during which individuals reported whether they had stressful events and reported their stress for the rest of the day. My overall goal is to find out whether the time to recovery from stressful events can predict psychopathology development. My analysis plan therefore was as follows:

  1. run a survival model to subtract coefs representing individual survival parameters
  2. use these individual survival parameters as predictors in the model predicting psychopathology over time in the form of : psychopathology ~ time*survival_coefs (of course also including all relevant random effects)

The dataset looks like this:
status (1 = recovered to baseline level of stress / 0 = not recovered within followup time)
time1 and time2: when status = 1 (recovered): Time 1 and time 2 are interval in which recovery has happened.When status = 0 (not recovered) then time1 is the amount of follow-up time during which recovery did not occur
SA.b.s is the level of stress at the time of the event (indicator of how stressful the event was
cens is the censoring variabele I created for the brms model

my data is, if I am not mistaken, both right-censored (not everyone recovers in the follow-up time) and interval censored (we know recovery happened somewhere between time1 and time2, but are not sure exactly which minute).

I am not sure whether I created the censoring variable in the right way. I defined it now as 0=observed recovery and 1=censored (right censored). See sample of the data below:

I formulated my brms syntax as follows:
fit1 ← brm(time1 | cens(cens, time2) ~ SA.b.s + (1 + SA.b.s |ID)

My questions are:

  • does the analysis plan make sense?
  • did I create the censoring variable in the right way?
  • is the brms code right? especially with regards to the censoring?

Thanks so much in advance!

Hi,

I assume you’re trying to split the analysis over two models to work around the fact you have a censored covariate?

If time to recovery wasn’t censored you would probably be working with this model, right?

Psychopathology development ~ Bernoulli(p)

logit(p) ~ a + B1*Time_to_recovery + B2[SAbs]

I’ve never worked with models with censored independent variables, but I suspect that’s possible. See:

and

https://cran.r-project.org/web/packages/brms/vignettes/brms_missings.html

If I’m not mistaken, this means that you’d be treating the censored values as missing data, replacing the missing data with parameters and then estimate those parameters using an imputation method method.

Let’s see if someone more qualified than me can weigh in on this.

Thanks so much for your response!
I have indeed focused on survival analysis due to the censored predictor issue. It is a work around. I have found some pages where suggestions were given to handle this with imputation (e.g. Censored predictors · Issue #565 · paul-buerkner/brms · GitHub), but I’m not sure it’s the best way to go… Paul Buerkner commented there that it may be something to add to brms’s functionalities in the future. @paul.buerkner do you maybe have any suggestions?
Thanks @fsdias for your response!

brms does not yet support censored predictors and this topic is a bit outside my expertise that I cannot really suggest any useful method for this right now. Sorry :-(

Thanks for your quick reply @paul.buerkner! No problem, than I will just use this workaround. Does subtracting the coef’s from the brms model to then use them as predictors in the second model sound logical to you? Or do you know anyone who may have expertise on this? Thanks a lot in advance.