Comparing adjusted survival of two groups with brms

I am not using Cox regression as the proportional hazards function is violated. Thus, I gave a try for brms, accelerated failure time regression, and this seems promising.

I have to compare the survival of sexes at different time points (1, 3, 6, 12 months). As, I do not feel confident with this, I provide an example with questions at the end.The example is on kidney data.

Data

library(survival)
library(brms)
kidney

Model

fit1 = brm(time | cens(censored) ~ age + sex + disease, data = kidney,
family = weibull, inits = "0")
fit1

Family: weibull 
  Links: mu = log; shape = identity 
Formula: time | cens(censored) ~ age + sex + disease 
   Data: x (Number of observations: 76) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Population-Level Effects: 
           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept      3.79      0.50     2.85     4.80 1.00     4445     3358
age           -0.00      0.01    -0.02     0.02 1.00     2826     3191
sexfemale      1.59      0.34     0.92     2.25 1.00     3279     2648
diseaseGN     -0.06      0.42    -0.88     0.80 1.00     2363     2701
diseaseAN     -0.53      0.40    -1.31     0.25 1.00     2345     2473
diseasePKD     1.33      0.59     0.19     2.52 1.00     2329     2920

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.98      0.10     0.79     1.18 1.00     3861     2565

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Survival of males and females

conditional_effects(fit1, effects = "sex")

Questions

  1. brms has several families for analysing survival. Which one to choose? Should I separate models, check their fit with pp_check() and compare them with loo() function, and choose the best?
  2. If I would like to get such plots at different time points. First, is such approach reasonable as the survival at different time points may vary between sexes? Second, should I run separate models or I should predict new data from the same model (fit1). How to code the newdata argument for brms survival model to get a prediction for 30 or 90 days?
  3. Are model summary estimates in log scale? If I exp() them, I will get time ratios?
1 Like

Hi, sorry for not getting to you earlier, your question is relevant and well written.

This is a tough question - really depends on the assumptions you are willing to make and background knowledge you have about the data. Using pp_checks and loo is definitely a good complement to theoretical inquiry. If theory and/or pp_checks don’t constrain the model choice much, you can also run a “multiverse analysis”, i.e. run all variants and see if the conclusions you make change. If they do not change much, you don’t really have to choose which model is “the best”. If they do change substantially, then a further investigation is warranted.

In accelerated failure-time models you are IMHO best served by using predictions with newdata. The problem is that to compare survival between sexes, you need to assume age and disease, as the difference may change with both. You can always do separate comparison for several ages and disease, so then the newdata could be constructed like:

newdata = crossing(sex = c("male", "female"), disease = c("GN", "AN"), 
  age = c(30, 50, 70))

There are then at least two sensible options: either you will use posterior_predict (i.e. include measurement noise) and predict actual times. Since the response of the model is the survival time, you can then easily compute the probability of survival in a subgroup g and sample s as e.g. mean(time[data$group == g, sample] > 30). Alternatively, you could use posterior_linpred to predict both the mean and the shape parameter of the Weibull and find the probabilities analytically from the Weibull CDF - this will let you get values “without measurement noise” but is slightly harder to code. You can always approximate this by simulating predictions for a large number of patients in each group and the using posterior_predict - with large number of patients the measurement noise will get smoothed out.

Once you have the survival probabilities for each group and sample, you can use them to compute differences/odds ratio (or anything else you need) per sample. This gives you samples of th difference/odds ratio that you could summarise as a posterior distribution.

Note that conditional_effects actually does a special case of this process under the hood.

Yes, model summaries in brms are always on the linear scale. So if the link is log (as it is in your model), exponentiating them will get mean time ratios.

Hope that clarifies more than confuses and best of luck with your model!

3 Likes