Survival analysis with right censored and interval censored data with brms

Hi,

I’m working on a project that aims to assess the survival of oak seedlings under different conditions.

We planted a few hundred acorns under different types of cover (shrub cover, tree cover, open areas etc.). Many acorns never germinated, but some grew into seedlings. We visited the plots periodically to check if the seedlings were still alive and recorded the number of days the seedlings survived. After a while, we ran out of funding, so we had to visit the sites less often than we would have liked. This means the visits were further and further apart towards the end of the study.

I believe the dataset has two types of censoring, interval censoring and right censoring.

Interval censoring - when we found a seedling alive at “t1” but dead at “t2”, we must assume that it died at some point between “t1” and “t2”.

Right censoring - because some seedlings managed to survive until the end of the study

I’m trying to run the analysis with “brms”, but I’m struggling with the syntax.

This is what the dataset looks like:

days_to_event_alive - number of of days between germination and the last time the seedling was found alive.

days_to_event_death - number of of days between germination and the time the plant was found dead.

Event - cause of death

cens_int - my attempt at creating a censoring column

I wrote the following brms code

fit = brm(days_to_event_alive|cens(cens_int,days_to_event_death) ~ 0+cover_type, data = df,
          prior(normal(0, 10), class = b),
          family = exponential,chains = 4, cores = 4)

and I have the following questions:

  • are my data interval censored and right censored?

  • did I assign the right codes to each type of censoring?

  • is the brms code right? It runs with the full dataset, but it returns “Log probability evaluates to log(0), i.e. negative infinity.” I guess it’s either because the model is not parameterized correctly or I need to set adequate priors. I’m not sure which.

Thanks in the advance for helping me.

Example data:

df<-structure(list(days_to_event = c(1069, 345, 1003, 309, 375, 246, 
1033, 345, 1003, 282, 411, 73, 67, 967, 543, 1033, 1033, 723, 
279, 1033, 753, 348, 1069, 1069, 345, 177, 1033, 164, 411, 411, 
128, 1069, 164, 348, 309, 1033, 753, 1033, 753, 753, 411, 375, 
243, 279, 1033, 213, 411, 723, 375, 513, 543, 1033, 375, 194, 
309, 723, 1003, 579, 1033, 348, 651, 210, 753, 375, 967, 579, 
477, 789, 1069, 375, 279, 1033, 309, 1033, 312, 103, 375, 345, 
753, 213, 1033, 789, 312, 1069, 279, 1033, 1033, 348, 177, 375, 
279, 279, 282, 753, 375, 309, 309, 753, 753, 411), days_to_event_alive = c(1069, 
282, 1003, 246, 312, 177, 1033, 282, 1003, 213, 348, 37, 31, 
687, 375, 753, 753, 513, 230, 1033, 543, 279, 1069, 1069, 282, 
128, 753, 129, 348, 348, 93, 789, 129, 279, 246, 1033, 543, 1033, 
543, 543, 348, 312, 194, 230, 1033, 164, 348, 513, 312, 345, 
375, 1033, 312, 159, 246, 513, 723, 411, 1033, 279, 441, 141, 
543, 312, 967, 411, 309, 579, 1069, 312, 230, 1033, 246, 1033, 
243, 67, 312, 282, 543, 164, 1033, 579, 243, 789, 230, 1033, 
1033, 279, 128, 312, 230, 230, 213, 543, 312, 246, 246, 543, 
543, 348), days_to_event_death = c(1069, 345, 1003, 309, 375, 
246, 1033, 345, 1003, 282, 411, 73, 67, 967, 543, 1033, 1033, 
723, 279, 1033, 753, 348, 1069, 1069, 345, 177, 1033, 164, 411, 
411, 128, 1069, 164, 348, 309, 1033, 753, 1033, 753, 753, 411, 
375, 243, 279, 1033, 213, 411, 723, 375, 513, 543, 1033, 375, 
194, 309, 723, 1003, 579, 1033, 348, 651, 210, 753, 375, 967, 
579, 477, 789, 1069, 375, 279, 1033, 309, 1033, 312, 103, 375, 
345, 753, 213, 1033, 789, 312, 1069, 279, 1033, 1033, 348, 177, 
375, 279, 279, 282, 753, 375, 309, 309, 753, 753, 411), cover_type = c("open", 
"open", "diverse", "salvi", "open", "open", "open", "open", "open", 
"open", "open", "salvi", "open", "ulex", "open", "open", "open", 
"salvi", "open", "diverse", "open", "open", "diverse", "diverse", 
"ulex", "open", "esteva", "open", "open", "open", "open", "diverse", 
"diverse", "open", "ulex", "open", "salvi", "open", "esteva", 
"salvi", "salvi", "salvi", "open", "open", "diverse", "open", 
"open", "open", "salvi", "open", "open", "open", "open", "open", 
"salvi", "open", "esteva", "open", "open", "diverse", "salvi", 
"diverse", "salvi", "open", "open", "diverse", "open", "open", 
"open", "open", "salvi", "open", "open", "diverse", "open", "salvi", 
"salvi", "open", "ulex", "salvi", "salvi", "diverse", "open", 
"open", "diverse", "salvi", "open", "open", "open", "open", "diverse", 
"open", "salvi", "open", "open", "open", "open", "open", "open", 
"open"), Event = c("End of study", "Dried", "End of study", "Dried", 
"Dried", "Dried", "End of study", "Dried", "End of study", "Dried", 
"Dried", "Unknown", "Unknown", "Unknown", "Dried", "Dried", "Dried", 
"Dried", "Dried", "End of study", "Dried", "Dried", "End of study", 
"End of study", "Unknown", "Dried", "Unknown", "Dried", "Dried", 
"Dried", "Dried", "Dried", "Dried", "Dried", "Unknown", "End of study", 
"Dried", "End of study", "Dried", "Dried", "Dried", "Dried", 
"Dried", "Dried", "End of study", "Dried", "Dried", "Dried", 
"Dried", "Dried", "Unknown", "End of study", "Dried", "Dried", 
"Dried", "Dried", "Unknown", "Dried", "End of study", "Dried", 
"Dried", "Dried", "Dried", "Dried", "End of study", "Unknown", 
"Dried", "Dried", "End of study", "Unknown", "Dried", "End of study", 
"Dried", "End of study", "Dried", "Unknown", "Dried", "Dried", 
"Dried", "Dried", "End of study", "Dried", "Dried", "Unknown", 
"Dried", "End of study", "End of study", "Dried", "Destroyed by wildboars", 
"Dried", "Dried", "Dried", "Dried", "Dried", "Dried", "Dried", 
"Dried", "Dried", "Dried", "Dried"), cens_int = c("right", "interval", 
"right", "interval", "interval", "interval", "right", "interval", 
"right", "interval", "interval", "interval", "interval", "interval", 
"interval", "interval", "interval", "interval", "interval", "right", 
"interval", "interval", "right", "right", "interval", "interval", 
"interval", "interval", "interval", "interval", "interval", "interval", 
"interval", "interval", "interval", "right", "interval", "right", 
"interval", "interval", "interval", "interval", "interval", "interval", 
"right", "interval", "interval", "interval", "interval", "interval", 
"interval", "right", "interval", "interval", "interval", "interval", 
"interval", "interval", "right", "interval", "interval", "interval", 
"interval", "interval", "right", "interval", "interval", "interval", 
"right", "interval", "interval", "right", "interval", "right", 
"interval", "interval", "interval", "interval", "interval", "interval", 
"right", "interval", "interval", "interval", "interval", "right", 
"right", "interval", "interval", "interval", "interval", "interval", 
"interval", "interval", "interval", "interval", "interval", "interval", 
"interval", "interval")), row.names = c(NA, -100L), class = c("tbl_df", 
"tbl", "data.frame"))

The answer is yes to all questions. The initialization problem does not tend to be a problem if it runs without any bad diagnostics. The sampler just have a little problem finding valid starting points, but finds them quite fast anyway.

1 Like