Model diagnosis / comparison for survival models


#1

Hi all,

I would like to learn more about what methods for model diagnosis are available for survival models and how to quantitatively do model comparison in a Bayesian survival setting.

I have seen people discussing the following (rather graphical) methods and would love to hear your thoughts as well as potential (Stan) experiences:

  • Cox-Snell residuals
  • Nelson-Aalen estimator of the cumulative hazard rate of the Cox-Snell residuals
  • Martingale residuals
  • Deviance residuals
  • Schoenfeld residuals

Cheers, Eren.


#2

The type of residuals you choose stems from the particular model you have chosen (semiparametric or which of many parametric models) and the particular lack of fit you are trying to detect. In some ways this is not a unified approach and we may want to seek a unified one that does not use residuals.

But of all the residuals you named, smoothed scaled Schoenfeld residual plots have demonstrated the most value in my opinion, to examine the proportional hazards assumption in a Cox semiparametric model.


#3

To me, an intuitive check that would be easy to do in Stan would be in your generated quantities block of your Stan model sample event times for each patient. Then check the coverage rates of the quantiles you get from those sampled times versus the actual event times.

For example, if your survival times are Weibull distributed you can use weibull_rng() in the generated quantities blog to generate simulated survival times.


#4

Regarding comparison, what actually about something like elpd_loo? However, this won’t be survival model specific and won’t check directly any of the aspects the residuals consider.


#5

That’s a good point, however it requires that you are able to sample (it is sufficient that you can invert the survival function, think inverse transform sampling).

For instance, Weibull is a special case of the family of parametric proportional hazard models by Royston & Parmar, where one has in that special Weibull limiting case actually no spline terms in the log of the cumulative baseline hazard function, hence the log of the cumulative hazard is a linear function in the log of the time. So in the general case of the R&P family, unless there is a away to inverse i-splines (maybe there is, I have to think), that wouldn’t work, right?


#6

Hm I see what you’re saying. The CDF of the survival time T is

P(T \le t) = 1 - S(t) = 1 - \exp \left[- \int_0^{t} \lambda(t) \, dt \right].

For t =0, this probability is zero, and as t increases the probability increases until it eventually gets to one. I think we could just draw a uniform random number, u \sim \mathrm{Uniform}(0,1) then we could integrate the above formula until P(T \le t) = u. The result would be a sample of the survival time T that follows the correct distribution. This integration scheme would be done in the generated quantities.

The only thing is you’d need a function that would integrate P(T \le t) up until a specified u, but that wouldn’t be too difficult. For example if you’re using forward Euler as your integration scheme I imagine you can do something like the following (in R)

integrate_up_to_u <- function(lambda, u, TOLERANCE) {

   Lambda <- 0.0
   t <- 0.0
   for(n in 1:MAX_ITERS) {
      dt <- select_stepsize(lambda, TOLERANCE)
      t <-  t + dt
      Lambda <- Lambda + lambda(t)*dt
      P <- 1 - exp(-Lambda)
      if (P >= u) {
         return(t)
      }
   }

}

You’d have to sort out the minor details like how to backtrack if you’re over-integrated past the specified tolerance, but that’s the basic idea.


#7

Why not use the algebra_solver for this? This is a root finding problem which you should be able to handle with it, no?


#8

That’s an excellent point. This is indeed a root finding problem where we want to find the t that satisfies P(T \le t) = u. The problem is that to use the algebraic solver in Stan we’d need P(T \le t) in closed-form as a callable function, which we don’t have. Unless maybe if we wrote it as a function that calls the 1-D integrator inside? I’m not sure if Stan can do that.


#9

The 1d integrator is about to land in Stan. In the meantime you can abuse the ode integrator to do the same…I have posted how to do this a few times on discourse.


#10

So it possible to pass in a function to the algebraic solver which itself calls the ODE solver inside of it?


#11

Yes!


#12

Btw, the stancon contribution from charles earlier this year (if i recall right) did showcase the algebra solver to find the steady state of a pk system…which meant to call the ode integrator in the function passed to the algebra solver.


#13

@wds15, @arya thanks! In fact the nice property of the R&P family of survival models is that one models directly the log cumulative hazard function, so the integration part is not required, we just need the algebraic solver (plus the ability to evaluate the utilised spline basis for the log cumulative baseline hazard function at arbitrary time points).

I will keep the ode-integrator method in mind for cases where one parametrises the hazard function itself…

Cheers!