Model diagnosis / comparison for survival models

Hi all,

I would like to learn more about what methods for model diagnosis are available for survival models and how to quantitatively do model comparison in a Bayesian survival setting.

I have seen people discussing the following (rather graphical) methods and would love to hear your thoughts as well as potential (Stan) experiences:

  • Cox-Snell residuals
  • Nelson-Aalen estimator of the cumulative hazard rate of the Cox-Snell residuals
  • Martingale residuals
  • Deviance residuals
  • Schoenfeld residuals

Cheers, Eren.

1 Like

The type of residuals you choose stems from the particular model you have chosen (semiparametric or which of many parametric models) and the particular lack of fit you are trying to detect. In some ways this is not a unified approach and we may want to seek a unified one that does not use residuals.

But of all the residuals you named, smoothed scaled Schoenfeld residual plots have demonstrated the most value in my opinion, to examine the proportional hazards assumption in a Cox semiparametric model.

1 Like

To me, an intuitive check that would be easy to do in Stan would be in your generated quantities block of your Stan model sample event times for each patient. Then check the coverage rates of the quantiles you get from those sampled times versus the actual event times.

For example, if your survival times are Weibull distributed you can use weibull_rng() in the generated quantities blog to generate simulated survival times.

1 Like

Regarding comparison, what actually about something like elpd_loo? However, this won’t be survival model specific and won’t check directly any of the aspects the residuals consider.

That’s a good point, however it requires that you are able to sample (it is sufficient that you can invert the survival function, think inverse transform sampling).

For instance, Weibull is a special case of the family of parametric proportional hazard models by Royston & Parmar, where one has in that special Weibull limiting case actually no spline terms in the log of the cumulative baseline hazard function, hence the log of the cumulative hazard is a linear function in the log of the time. So in the general case of the R&P family, unless there is a away to inverse i-splines (maybe there is, I have to think), that wouldn’t work, right?

Hm I see what you’re saying. The CDF of the survival time T is

P(T \le t) = 1 - S(t) = 1 - \exp \left[- \int_0^{t} \lambda(t) \, dt \right].

For t =0, this probability is zero, and as t increases the probability increases until it eventually gets to one. I think we could just draw a uniform random number, u \sim \mathrm{Uniform}(0,1) then we could integrate the above formula until P(T \le t) = u. The result would be a sample of the survival time T that follows the correct distribution. This integration scheme would be done in the generated quantities.

The only thing is you’d need a function that would integrate P(T \le t) up until a specified u, but that wouldn’t be too difficult. For example if you’re using forward Euler as your integration scheme I imagine you can do something like the following (in R)

integrate_up_to_u <- function(lambda, u, TOLERANCE) {

   Lambda <- 0.0
   t <- 0.0
   for(n in 1:MAX_ITERS) {
      dt <- select_stepsize(lambda, TOLERANCE)
      t <-  t + dt
      Lambda <- Lambda + lambda(t)*dt
      P <- 1 - exp(-Lambda)
      if (P >= u) {


You’d have to sort out the minor details like how to backtrack if you’re over-integrated past the specified tolerance, but that’s the basic idea.

1 Like

Why not use the algebra_solver for this? This is a root finding problem which you should be able to handle with it, no?

1 Like

That’s an excellent point. This is indeed a root finding problem where we want to find the t that satisfies P(T \le t) = u. The problem is that to use the algebraic solver in Stan we’d need P(T \le t) in closed-form as a callable function, which we don’t have. Unless maybe if we wrote it as a function that calls the 1-D integrator inside? I’m not sure if Stan can do that.

1 Like

The 1d integrator is about to land in Stan. In the meantime you can abuse the ode integrator to do the same…I have posted how to do this a few times on discourse.

1 Like

So it possible to pass in a function to the algebraic solver which itself calls the ODE solver inside of it?


1 Like

Btw, the stancon contribution from charles earlier this year (if i recall right) did showcase the algebra solver to find the steady state of a pk system…which meant to call the ode integrator in the function passed to the algebra solver.

1 Like

@wds15, @arya thanks! In fact the nice property of the R&P family of survival models is that one models directly the log cumulative hazard function, so the integration part is not required, we just need the algebraic solver (plus the ability to evaluate the utilised spline basis for the log cumulative baseline hazard function at arbitrary time points).

I will keep the ode-integrator method in mind for cases where one parametrises the hazard function itself…


Just tried to follow up on this, so the first step would be to write the function encoding the algebraic equation, or actually the root-finding problem, (called system in section 20.2. of the manual). The required signature of system is also given there. Since in my case system would need to depend on a uniform random number, which I plan to generate in the generated quantities block (where also the algebra_solver is called), I wonder whether I can include the uniform random number in the x_r array? The manual says x_r must be an expression that only involves data or transformed data variables. Is my scenario valid in this sense?

Yes, that is correct. Just sample in generated quantities a 0-1 variate and pass that in via x_r.

1 Like

Another question/thought on this: What about censored survival times? I see how we can check coverage for non-censored times, but what do we compare in case of censored samples? In my applications the majority of times are actually censored.

Model for censored times is equivalent to binary classification and you can use the usual calibration plot for the probabilities.

1 Like

Not sure, if I understand. What if I don’t model explicitly the censoring process, i.e. I do not have a generative model for whether a datapoint is censored or not… Your comment would only apply when I model when I have a corresponding model, right?

Can you post Stan code?

Well, one thing you can is if you get a posterior distribution of survival times you can plot the censoring time on top. If the censoring time is way after where all the sampled survival times are, then that’s indicative of model misfit. The model there is saying that the patient should have died way earlier, but they didn’t.

Obviously you don’t get as much diagnostic information as with an uncensored data point, but usually what I look for in comparing posterior predictives is egregious model misfits.