I have a relatively large survival data (n > 10 000), including 5 predictors. Predictors p1 and p2 are continuous, the rest are categorical. Group variable is the one I am interested in, the remaining are just used for adjusting. PH assumption is violated for 4 predictors (checked with cox.zph()). PH assumption is violated at different time points for different predictors during the three-year follow-up.
chisq df p
group 16.751 3 0.00080
p1 19.076 1 0.0000125584
p2 6.766 1 0.00929
p3 14.469 1 0.00014
p4 0.124 1 0.72447
GLOBAL 60.854 7 0.0000000001
The survival version of rstanarm seems offering really interesting options for modelling. Thank you for developing it guys! How can I address the PH violation, using rstanarm package’s survival functionalities (https://arxiv.org/pdf/2002.09633.pdf)?
My own (rookie) thoughts
A) Surv(time, status) ~ group + p1 + p2 + p3 + p4
I can not use such specification as I violate the following assumptions?
However, my limited understanding says that AFT models are often used in case of violated PH assumption. Why is it wrong to use something like this:
Surv(time, status) ~ group + p1 + p2 + p3 + p4, bazehaz = "weibull-aft"
B) Surv(time, status) ~ tve(group)
tve() functionality is really good, giving time-varying effects when I run that simple model with one predictor. However, can use the tve() for all predictors violating PH assumption? Tried using multiple tve()-s but model seemed not to compile.
C) Time-varying covariates
As four variables violate the PH assumption, can it be one solution? Possible to use multiple time-varying covariates? Or I only use one and address the rest with other options (e.g. tve()).
D) Piecewise constant function
This is a bit similar to solution B as tve() is used? Should I use it only for “group” or also for other predictors?
E) frequentists use strata() for categorical variables
Predictors p3 and p4 are categorical. Does rstanam has an analogue function?
F) Better ideas are very welcome
Data and simple model code for making this reproducible
library(tidyverse)
df = read_delim("https://www.dropbox.com/s/q5r4bqgz6a6e6cx/data.csv?dl=1", ";", escape_double = FALSE, trim_ws = TRUE)
#simple model formula
library(rstanarm)
fit = stan_surv(formula = Surv(time, status) ~ group + p1 + p2 + p3 + p4, data = df, cores = 3, chains = 3)