Model diagnosis / comparison for survival models

ermeel · August 10, 2018, 4:38am

Hi all,

I would like to learn more about what methods for model diagnosis are available for survival models and how to quantitatively do model comparison in a Bayesian survival setting.

I have seen people discussing the following (rather graphical) methods and would love to hear your thoughts as well as potential (Stan) experiences:

Cox-Snell residuals
Nelson-Aalen estimator of the cumulative hazard rate of the Cox-Snell residuals
Martingale residuals
Deviance residuals
Schoenfeld residuals

Cheers, Eren.

harrelfe · August 10, 2018, 12:25pm

The type of residuals you choose stems from the particular model you have chosen (semiparametric or which of many parametric models) and the particular lack of fit you are trying to detect. In some ways this is not a unified approach and we may want to seek a unified one that does not use residuals.

But of all the residuals you named, smoothed scaled Schoenfeld residual plots have demonstrated the most value in my opinion, to examine the proportional hazards assumption in a Cox semiparametric model.

arya · August 10, 2018, 12:41pm

To me, an intuitive check that would be easy to do in Stan would be in your generated quantities block of your Stan model sample event times for each patient. Then check the coverage rates of the quantiles you get from those sampled times versus the actual event times.

For example, if your survival times are Weibull distributed you can use weibull_rng() in the generated quantities blog to generate simulated survival times.

ermeel · August 10, 2018, 1:26pm

Regarding comparison, what actually about something like elpd_loo? However, this won’t be survival model specific and won’t check directly any of the aspects the residuals consider.

ermeel · August 10, 2018, 2:13pm

That’s a good point, however it requires that you are able to sample (it is sufficient that you can invert the survival function, think inverse transform sampling).

For instance, Weibull is a special case of the family of parametric proportional hazard models by Royston & Parmar, where one has in that special Weibull limiting case actually no spline terms in the log of the cumulative baseline hazard function, hence the log of the cumulative hazard is a linear function in the log of the time. So in the general case of the R&P family, unless there is a away to inverse i-splines (maybe there is, I have to think), that wouldn’t work, right?

arya · August 14, 2018, 3:47pm

Hm I see what you’re saying. The CDF of the survival time T is

P(T \le t) = 1 - S(t) = 1 - \exp \left[- \int_0^{t} \lambda(t) \, dt \right].

For t =0, this probability is zero, and as t increases the probability increases until it eventually gets to one. I think we could just draw a uniform random number, u \sim \mathrm{Uniform}(0,1) then we could integrate the above formula until P(T \le t) = u. The result would be a sample of the survival time T that follows the correct distribution. This integration scheme would be done in the generated quantities.

The only thing is you’d need a function that would integrate P(T \le t) up until a specified u, but that wouldn’t be too difficult. For example if you’re using forward Euler as your integration scheme I imagine you can do something like the following (in R)

integrate_up_to_u <- function(lambda, u, TOLERANCE) {

   Lambda <- 0.0
   t <- 0.0
   for(n in 1:MAX_ITERS) {
      dt <- select_stepsize(lambda, TOLERANCE)
      t <-  t + dt
      Lambda <- Lambda + lambda(t)*dt
      P <- 1 - exp(-Lambda)
      if (P >= u) {
         return(t)
      }
   }

}

You’d have to sort out the minor details like how to backtrack if you’re over-integrated past the specified tolerance, but that’s the basic idea.

wds15 · August 14, 2018, 4:02pm

Why not use the algebra_solver for this? This is a root finding problem which you should be able to handle with it, no?

arya · August 14, 2018, 6:22pm

That’s an excellent point. This is indeed a root finding problem where we want to find the t that satisfies P(T \le t) = u. The problem is that to use the algebraic solver in Stan we’d need P(T \le t) in closed-form as a callable function, which we don’t have. Unless maybe if we wrote it as a function that calls the 1-D integrator inside? I’m not sure if Stan can do that.

wds15 · August 14, 2018, 6:38pm

The 1d integrator is about to land in Stan. In the meantime you can abuse the ode integrator to do the same…I have posted how to do this a few times on discourse.

arya · August 14, 2018, 6:47pm

So it possible to pass in a function to the algebraic solver which itself calls the ODE solver inside of it?

wds15 · August 14, 2018, 7:11pm

Yes!

wds15 · August 14, 2018, 7:25pm

Btw, the stancon contribution from charles earlier this year (if i recall right) did showcase the algebra solver to find the steady state of a pk system…which meant to call the ode integrator in the function passed to the algebra solver.

ermeel · August 15, 2018, 5:47am

@wds15, @arya thanks! In fact the nice property of the R&P family of survival models is that one models directly the log cumulative hazard function, so the integration part is not required, we just need the algebraic solver (plus the ability to evaluate the utilised spline basis for the log cumulative baseline hazard function at arbitrary time points).

I will keep the ode-integrator method in mind for cases where one parametrises the hazard function itself…

Cheers!

ermeel · August 20, 2018, 3:09pm

Just tried to follow up on this, so the first step would be to write the function encoding the algebraic equation, or actually the root-finding problem, (called system in section 20.2. of the manual). The required signature of system is also given there. Since in my case system would need to depend on a uniform random number, which I plan to generate in the generated quantities block (where also the algebra_solver is called), I wonder whether I can include the uniform random number in the x_r array? The manual says x_r must be an expression that only involves data or transformed data variables. Is my scenario valid in this sense?

wds15 · August 20, 2018, 3:11pm

Yes, that is correct. Just sample in generated quantities a 0-1 variate and pass that in via x_r.

ermeel · August 26, 2018, 5:06pm

Another question/thought on this: What about censored survival times? I see how we can check coverage for non-censored times, but what do we compare in case of censored samples? In my applications the majority of times are actually censored.

avehtari · August 27, 2018, 8:02am

Model for censored times is equivalent to binary classification and you can use the usual calibration plot for the probabilities.

ermeel · August 27, 2018, 11:05am

Not sure, if I understand. What if I don’t model explicitly the censoring process, i.e. I do not have a generative model for whether a datapoint is censored or not… Your comment would only apply when I model when I have a corresponding model, right?

avehtari · August 27, 2018, 12:47pm

Can you post Stan code?

arya · August 27, 2018, 2:11pm

Well, one thing you can is if you get a posterior distribution of survival times you can plot the censoring time on top. If the censoring time is way after where all the sampled survival times are, then that’s indicative of model misfit. The model there is saying that the patient should have died way earlier, but they didn’t.

Obviously you don’t get as much diagnostic information as with an uncensored data point, but usually what I look for in comparing posterior predictives is egregious model misfits.

Topic		Replies	Views
How to validate a survival model with both right censoring and interval censoring? Modeling ecology	14	2507	February 25, 2022
Stan for survival models Modeling	26	12310	July 18, 2018
Calculating hazard ratios and interpreting differences in survival and between a multi-level covariate from rstanarm::stan_surv rstanarm techniques , interpret-results , survival	3	1520	May 25, 2021
Survival models in rstanarm Developers	120	12849	October 21, 2020
Plotting and validating survival analysis models Modeling	8	1233	January 8, 2021

Model diagnosis / comparison for survival models

Related topics