Rstanarm survival: options for dealing with violated proportional hazards

I have a relatively large survival data (n > 10 000), including 5 predictors. Predictors p1 and p2 are continuous, the rest are categorical. Group variable is the one I am interested in, the remaining are just used for adjusting. PH assumption is violated for 4 predictors (checked with cox.zph()). PH assumption is violated at different time points for different predictors during the three-year follow-up.

chisq df            p
group  16.751  3      0.00080
p1     19.076  1 0.0000125584
p2      6.766  1      0.00929
p3     14.469  1      0.00014
p4      0.124  1      0.72447
GLOBAL 60.854  7 0.0000000001

The survival version of rstanarm seems offering really interesting options for modelling. Thank you for developing it guys! How can I address the PH violation, using rstanarm package’s survival functionalities (https://arxiv.org/pdf/2002.09633.pdf)?

My own (rookie) thoughts

A) Surv(time, status) ~ group + p1 + p2 + p3 + p4

I can not use such specification as I violate the following assumptions?
image

However, my limited understanding says that AFT models are often used in case of violated PH assumption. Why is it wrong to use something like this:

Surv(time, status) ~ group + p1 + p2 + p3 + p4, bazehaz = "weibull-aft"

B) Surv(time, status) ~ tve(group)

tve() functionality is really good, giving time-varying effects when I run that simple model with one predictor. However, can use the tve() for all predictors violating PH assumption? Tried using multiple tve()-s but model seemed not to compile.

C) Time-varying covariates

As four variables violate the PH assumption, can it be one solution? Possible to use multiple time-varying covariates? Or I only use one and address the rest with other options (e.g. tve()).

D) Piecewise constant function

This is a bit similar to solution B as tve() is used? Should I use it only for “group” or also for other predictors?

E) frequentists use strata() for categorical variables
Predictors p3 and p4 are categorical. Does rstanam has an analogue function?

F) Better ideas are very welcome

Data and simple model code for making this reproducible

library(tidyverse)
df = read_delim("https://www.dropbox.com/s/q5r4bqgz6a6e6cx/data.csv?dl=1", ";", escape_double = FALSE, trim_ws = TRUE)

#simple model formula
library(rstanarm)
fit = stan_surv(formula = Surv(time, status) ~ group + p1 + p2 + p3 + p4, data = df, cores = 3, chains = 3)
1 Like

Hi @di4oi4! So the short answer is that tve() is intended to do what you want; i.e. model non-proportional hazards. From memory it has the piecewise constant and the B-splines approaches, to allow some flexibility in how you model the non-proportionality.

I’m a bit hazy on the details these days – but I think it is as the doc says… that the assumption in the AFT model is time-fixed acceleration factors, not proportional hazards. So I think you are right, the AFT won’t enforce PH, but will enforce time-fixed AFT coefficients (i.e. acceleration factors). You’d have to do some plots and model diagnostics I think to work out whether the AFT model solves your non-proportionality issue or not. Although it might not enforce PH, it probably won’t model the non-proportionality in a really flexible way like using cubic B-splines (like tve() attempts to do).

Damn that is annoying. From memory, I think the tve() function should be allowed for multiple predictors in the model. Maybe if you can post a reprex where this fails I could try look into it, although I don’t have a lot of spare time for deep-diving into bug reports atm… :-(

Some suggestions: Is it one of the covariates in particular that is problematic? Or just when you use tve() for more than one covariate? Are the categorical covariates just dummy indicators, or factors with >2 categories?

These would induce non-proportionality, but it is not a modelling solution. Either your covariate varies across time or it doesn’t. This would be a characteristic of your data, not so much the modelling approach.

You should be able to use it for any of the covariates, but the interpretation of a time-varying coefficient is a bit harder when the covariate is continuous (you have to plot the relationship or something for specific values of the covariate).

Whether you use piecewise constant or the default B-splines (solution B) just depends on the shape or type of curve you want to use to model the time-varying coefficient. B-splines are smooth and wiggly, piecewise constant is step-like. If you have defined cutpoints where you know the effect/coefficient would change (e.g. the expiry of the effect of a drug or something) then you might use piecewise constant. If you aren’t sure how the effect varies over time, or you think the changes are smooth over time, then you’d be better off with B-splines as opposed to piecewise constant.

Nah there isn’t any stratification functionality implemented. You’d have to just fit entirely separate models for each of your strata or something.

Hope that helps somewhat!

3 Likes

Thank you! It clarifies a lot.

I had time to test more and everything seems nice. The function tve() can be used with multiple variables in the model, giving time varying effects for each covariated specified in this way.

I found out that I had a problem with only one continuous variable - age. Sampling was really slow and sometimes showed an error as follows:

Log probability evaluates to log(0), i.e. negative infinity. Stan can’t start sampling from this initial value.

After several tried it finally compiled the model, giving an output as follows. Does the plot has a meaningful interpretation, i.e. what is the reference used for calculating hazard ratio?

stan_surv
 baseline hazard: weibull
 formula:         Surv(time, status) ~ tve(age)
 observations:    210
 events:          141 (67.1%)
 right censored:  69 (32.9%)
 delayed entry:   no
------
                 Median MAD_SD exp(Median)
(Intercept)      -8.4    1.1     NA       
age               0.1    0.0    1.1       
age:tve-bs-coef1  0.0    0.0    1.0       
age:tve-bs-coef2  0.0    0.0    1.0       
age:tve-bs-coef3  0.0    0.0    1.0       
smooth_sd[age]    0.1    0.1    1.1       
weibull-shape     0.8    0.1     NA       

One more final question. Is this normal that a model like this may run for a few days? Data includes around 12000 subjects. Continuous variables are centered. One predictor has only 2 levels. Simple models with one predictor run really reasonably quickly (5-10 minutes with the same data).

stan_surv(formula = Surv(time, status) ~ p1 + tve(p2) + p3 , bazehaz = "ms", basehaz_ops = list(df = 8))

I also tried binning the continuous p2 to categorical, but still quite slow. Or it should be expected as I have a spline model with time-varying covariate?

@di4oi4 I’m so sorry I didn’t get around to replying to this.

Although it might be a bit late now, thought I would try answer now anyway.

So I think the plot of the tve hazard ratio here is just a plot of the coefficient as a function of time. The reference/meaning for the hazard ratio would depend on how you’d set up your covariate I guess. The coefficient here would be the relative change in hazard for comparison between covariate values
x = 1 and x = 0 I think, with all other covariates at their reference levels. If that isn’t particularly meaningful, you probably want to at least center and possibly even scale your continuous covariate.

Hmm, that doesn’t sound ideal. The tve models definitely do take longer to fit, since the underlying data is expanded to K quadrature points (default is K = 15 I think). So since the underlying data is expanded MCMC each iteration has to do more work. That is in addition to an increase in the underlying complexity of the model as well I guess. Going from 10 minutes to a few days though sounds like it isn’t just the expansion of the data that is taking the time, the complexity of the model has obviously had an impact and the sampler may be struggling.

It would be worth trying to unpick why that predictor is problematic. Some ideas:

  • does the model struggle with just some form of p2 or categorical p2 in there? If not, perhaps p2 is collinear with other predictors.
  • is the p2 distribution problematic? could scaling it to sd of 1 help?
  • maybe trying to change the priors? (although might not be possible for tve too much, I can’t remember the prior details).
2 Likes

I finally got time to test these solutions. scaling p2 helped! Thanks!

However, I still got one new interesting observation: if I specify p2 with tve() then my model models “group” variable as time-varying? Is this normal behaviour?

m = stan_surv(Surv(time, status) ~ group + tve(p2_scaled), data = data)

stan_surv
 baseline hazard: M-splines on hazard scale
 formula:         Surv(time, status) ~ group + tve(scale(age))
 observations:    700
 events:          473 (67.6%)
 right censored:  227 (32.4%)
 delayed entry:   no
------
                    Median MAD_SD exp(Median)
(Intercept)         -0.4    0.1     NA       
groupE               1.7    0.2    5.4       
p2_scaled            0.4    0.0    1.5       
groupE:tve-bs-coef1 -1.0    0.4    0.4       
groupE:tve-bs-coef2 -0.9    0.4    0.4       
groupE:tve-bs-coef3 -1.3    0.4    0.3       
smooth_sd[groupE]    0.5    0.5    1.6       
m-splines-coef1      0.0    0.0     NA       
m-splines-coef2      0.0    0.0     NA       
m-splines-coef3      0.1    0.0     NA       
m-splines-coef4      0.1    0.0     NA       
m-splines-coef5      0.2    0.1     NA       
m-splines-coef6      0.1    0.1     NA       
m-splines-coef7      0.2    0.1     NA       
m-splines-coef8      0.1    0.0     NA   

image

That definitely looks like a bug. Ideally we should post an issue in the GitHub repo for this. We should probably try pin down exactly when this is happening with a reproducible example. It would be amazing if you are keen to do that, or I can myself sometime this week perhaps.

It may just be a labelling issue. For example what happens if we switch the order of the two variables in the formula.
Because the main effects are all listed and then the smooth terms for the first covariate suggests to me it might just be the labelling in the output that is wrong.

Or it may the tve is actually being fit for the wrong variable. Since this is only cropping up now and not in any earlier examples, perhaps that would be something to do with one of the covariates being continuous (p2_scaled?), which might be more of an edge case, since I think usually people would be using the tve with a factor or 0/1 variable.

2 Likes

Actually, I see in the printout of the formula it is showing that covariate transformation as tve(scale(age)). The scale age thing is a bit odd. Maybe that is causing the odd behaviour. Maybe it is better to manually transform the variable before model fitting. E.g. calculate the SD of the variable and then create the new variable in the data frame by dividing P2 by the SD. Be interesting to know if that helps. It looks like the scale() function might be carrying forward some extra class information for the variable p2_scaled in the data frame and perhaps stan_surv isn’t appropriately able to deal with that class.

2 Likes

Thanks for your toughts! Always happy to help you guys - Survival models with time-varying specification: the first covariate will be modeled as time-varying variable · Issue #529 · stan-dev/rstanarm · GitHub

Few observations. Same issue with categorical variables. And you’re correct in terms of variable order: tve()-specified variable(s) always have to be the first in the model.

I have a doubt.
Is it possible to extract the draws from the plot above to calculate time-varying hazard ratios and compute hypothesis on the most plausible time-varying values of the variable age?
For example I would like to calculate the posterior probability that the hazard ratio at time 900 is below 1…
@sambrilleman

I think posterior_survfit() has an argument called return_matrix that can be used to return the posterior draws for the predictions.

But as for whether you can easily get the posterior draws for the time varying hazard ratio shown in that plot, I can’t recall off the top of my head. You might be able to just get the spline knots etc (for the time varying effect) from the fitted object and then just construct the spline basis at time 900. Then the posterior draws of the coefficients are just accessible using as.matrix() on the fitted model object I think…?

I just took a quick look at some of the source code for plot.stansurv and unfortunately I don’t think the underlying calculations are exposed.

But the more I think about it, you could use posterior_survfit to predict the log hazard for each value the covariate (e.g. groupE = 0 and groupE = 1) at time 900, specifying return_matrix = TRUE to get the posterior draws. Taking the difference between those predictions will get you the time varying log hazard ratio I guess (which can be exponentiated). Hope that helps. Sorry I don’t have a more convenient solution!