I would like to learn more about what methods for model diagnosis are available for survival models and how to quantitatively do model comparison in a Bayesian survival setting.

I have seen people discussing the following (rather graphical) methods and would love to hear your thoughts as well as potential (Stan) experiences:

Cox-Snell residuals

Nelson-Aalen estimator of the cumulative hazard rate of the Cox-Snell residuals

The type of residuals you choose stems from the particular model you have chosen (semiparametric or which of many parametric models) and the particular lack of fit you are trying to detect. In some ways this is not a unified approach and we may want to seek a unified one that does not use residuals.

But of all the residuals you named, smoothed scaled Schoenfeld residual plots have demonstrated the most value in my opinion, to examine the proportional hazards assumption in a Cox semiparametric model.

To me, an intuitive check that would be easy to do in Stan would be in your generated quantities block of your Stan model sample event times for each patient. Then check the coverage rates of the quantiles you get from those sampled times versus the actual event times.

For example, if your survival times are Weibull distributed you can use weibull_rng() in the generated quantities blog to generate simulated survival times.

Regarding comparison, what actually about something like elpd_loo? However, this wonâ€™t be survival model specific and wonâ€™t check directly any of the aspects the residuals consider.

Thatâ€™s a good point, however it requires that you are able to sample (it is sufficient that you can invert the survival function, think inverse transform sampling).

For instance, Weibull is a special case of the family of parametric proportional hazard models by Royston & Parmar, where one has in that special Weibull limiting case actually no spline terms in the log of the cumulative baseline hazard function, hence the log of the cumulative hazard is a linear function in the log of the time. So in the general case of the R&P family, unless there is a away to inverse i-splines (maybe there is, I have to think), that wouldnâ€™t work, right?

For t =0, this probability is zero, and as t increases the probability increases until it eventually gets to one. I think we could just draw a uniform random number, u \sim \mathrm{Uniform}(0,1) then we could integrate the above formula until P(T \le t) = u. The result would be a sample of the survival time T that follows the correct distribution. This integration scheme would be done in the generated quantities.

The only thing is youâ€™d need a function that would integrate P(T \le t) up until a specified u, but that wouldnâ€™t be too difficult. For example if youâ€™re using forward Euler as your integration scheme I imagine you can do something like the following (in R)

integrate_up_to_u <- function(lambda, u, TOLERANCE) {
Lambda <- 0.0
t <- 0.0
for(n in 1:MAX_ITERS) {
dt <- select_stepsize(lambda, TOLERANCE)
t <- t + dt
Lambda <- Lambda + lambda(t)*dt
P <- 1 - exp(-Lambda)
if (P >= u) {
return(t)
}
}
}

Youâ€™d have to sort out the minor details like how to backtrack if youâ€™re over-integrated past the specified tolerance, but thatâ€™s the basic idea.

Thatâ€™s an excellent point. This is indeed a root finding problem where we want to find the t that satisfies P(T \le t) = u. The problem is that to use the algebraic solver in Stan weâ€™d need P(T \le t) in closed-form as a callable function, which we donâ€™t have. Unless maybe if we wrote it as a function that calls the 1-D integrator inside? Iâ€™m not sure if Stan can do that.

The 1d integrator is about to land in Stan. In the meantime you can abuse the ode integrator to do the sameâ€¦I have posted how to do this a few times on discourse.

Btw, the stancon contribution from charles earlier this year (if i recall right) did showcase the algebra solver to find the steady state of a pk systemâ€¦which meant to call the ode integrator in the function passed to the algebra solver.

@wds15, @arya thanks! In fact the nice property of the R&P family of survival models is that one models directly the log cumulative hazard function, so the integration part is not required, we just need the algebraic solver (plus the ability to evaluate the utilised spline basis for the log cumulative baseline hazard function at arbitrary time points).

I will keep the ode-integrator method in mind for cases where one parametrises the hazard function itselfâ€¦

Just tried to follow up on this, so the first step would be to write the function encoding the algebraic equation, or actually the root-finding problem, (called system in section 20.2. of the manual). The required signature of system is also given there. Since in my case system would need to depend on a uniform random number, which I plan to generate in the generated quantities block (where also the algebra_solver is called), I wonder whether I can include the uniform random number in the x_r array? The manual says x_r must be an expression that only involves data or transformed data variables. Is my scenario valid in this sense?

Another question/thought on this: What about censored survival times? I see how we can check coverage for non-censored times, but what do we compare in case of censored samples? In my applications the majority of times are actually censored.

Not sure, if I understand. What if I donâ€™t model explicitly the censoring process, i.e. I do not have a generative model for whether a datapoint is censored or notâ€¦ Your comment would only apply when I model when I have a corresponding model, right?

Well, one thing you can is if you get a posterior distribution of survival times you can plot the censoring time on top. If the censoring time is way after where all the sampled survival times are, then thatâ€™s indicative of model misfit. The model there is saying that the patient should have died way earlier, but they didnâ€™t.

Obviously you donâ€™t get as much diagnostic information as with an uncensored data point, but usually what I look for in comparing posterior predictives is egregious model misfits.