I have one problem with this approach: How can you guarantee in the Royston & Parmar proposed Time-varying hazard ratios framework, see graphic below, that the derivative w.r.t. to x of the log cumulative hazard ratio, which you need for the likelihood, is non-negative?

Here we need to replace s in the second figure by the term \gamma_0 + \boldsymbol{\gamma}^T \mathbf{v}(x) in the first graphic. For instance if I choose a spline basis \mathbf{v}(x) that is guaranteed to be monotone in x (e.g. I-splines or integrated B-splines), then we need to ensure that all the elements of \boldsymbol{\gamma} are non-negative. Now importantly, this vector is defined in the first figure in terms of the covariate vector \mathbf{z}, which in general is real and not constraint to be non-negative. Additionally since covariates depend on the subject, we actually have that \boldsymbol{\gamma} is subject specific! Thus we need to pick the matrix \{\gamma_{jl}\} so that all the subject-specific vectors \boldsymbol{\gamma} are non-negative. Depending on the heterogeneity of covariates, this seems like a hopeless task…

I was also thinking maybe a cubic growth in the hazard after the last knot might make sense physiologically as opposed to a linear growth outside the boundary depending on the problem. If we get a new patient whose survival time is after than the last knot, then that patient has survived an amount of time that is unprecedented. Thus, for that patient to be consistent with the rest of the patients we would expect their hazard to be growing rapidly. I’m not sure if that makes sense?

Also, how is the Stan model you wrote doing? I like the idea evaluation of the I-spline basis functions outside of Stan since the survival/censoring times are fixed. That should save some computation. I was thinking in some cases it might be useful to do it from within Stan though. One case I was thinking is if we have a censored patient who we observed up to time t_i, we may want to model and infer their actual time of death as an unknown parameter in Stan instead of treating them as censored e.g.

real<lower=t_i> D_i;

The Bayesian framework and Stan easily allows us to do this which is nice, but it would require evaluation of the I-Splines from within the Stan code. Another example I can think of where we’d want to do that is if we want to sample possible times the patient could have died in the generated quantities block. I think that would be pretty useful for model diagnostics.

@ermeel Yeah, fair point, I hadn’t really thought about the positivity constraints, or the general issues for the approach in a Bayesian setting, until this discussion began.

I noticed in your Stan program you were putting the I-splines on the log cumulative hazard rather than just the cumulative hazard, and you mentioned here that that is your preference – I was just wondering why?

Did you get your I-spline Stan file working, i.e. did the examples in your test.R file recover the correct hazard ratios?

A couple of weeks ago I started on a package just like @jonah had suggested, only missing the ‘r’ in the name! See here. I haven’t got too far on it, and the further I get, the more functions I end up porting across from rstanarm, and the more I begin to think these modelling functions could just go in there so as to not create unnecessary work with priors, predictions, etc. In any case, I had tried a model using the R+P spline basis – didn’t have it running, and then you started this convo, so I tried using the I-spline basis instead, as you have done – but I haven’t got the Stan model working yet (at least not the splines model – a Weibull model seems to work ok). Anyway, take a look if you’re interested.

Off the top of my head, this sounds not unreasonable, however why should the cubic growth be favoured here? The nice thing with the I-Splines is that we can pick the order, and hence the boundary asymptotics.

I haven’t done very extensive tests, but here come my preliminary results (I would really appreciate if you could verify or comment here):

For the breast cancer dataset (from R&P) I could recover the estimated treatment-effect from the literature, i.e. \beta_1,\beta_2 in the R&P’s paper.

For the pediatric cancer dataset, I could recover the \beta vector corresponding to the covariates, as compared to what Hastie & Efron list in their book.

For the leuk dataset version I found here I could recover the authors values of the treatment effect coefficient, which he obtained from coxph with different options to resolve ties.

All the intervals in the figures contain 90\% probability mass of the posterior for that parameter.

I presume the above would imply that I can recover the hazard ratios, if I am not mistaken (since the ratios are \exp(\beta), right?)

I agree. In this regard, it would be super helpful if we could convince ourselves that the recursive expression in the Ramsay paper, allowing to write the I-splines in terms of the M-splines without integrating, is applicable.

This is an interesting example application, I haven’t considered, yet.

Yes that is a very useful diagnostic, I should consider.

Maybe I was a bit vague here. What I meant is that in a Bayesian setting I prefer a spline basis that enforces explicitly monotonicity rather than one that does it “implicitly” by conditioning on the data. I was not referring to any preference of using the i-splines for log cumulative hazard modelling over using them to model the cumulative hazard modelling. But when I think about it, I see that they are fundamentally different, right? In the former the splines are polynomial in \log(t), so the cumulative hazard is an exponential of polynomials in \log(t), whereas in the latter case the cumulative hazard is polynomial in \log(t). Comparing with Weibull case, for instance, this latter approach leads to a much weaker decay of the survival function in \log(t). So I guess it really depends on our physiological model, right?

In the meantime I also added a version that supports random-walk priors for the spline-basis coefficients (\gamma in the Royston&Parmar) terminology. Since the spline-basis (and its derivative) I currently calculate outside of Stan, we can now experiment with a variety of different splines and see whether the random-walk prior would allow us to work with much larger number of knots (see http://mc-stan.org/users/documentation/case-studies/splines_in_stan.html). For instance, if we use integrated B-splines, we have that a random-walk prior would favour a log cumulative hazard that is linearly increasing (since the prior favours flat derivatives, which are the B-splines).

One nice thing about the case where we use a random-walk prior (constrained to be non-negative) and integrated B-splines, \hat{B}_i(x), is that (since \sum_{i} B_i(x) \equiv 1) we have that \sum_{i} \hat{B}_i(x) = x, thus when all weights are equal we can obtain (with an appropriate intercept term) the Weibull distribution. My understanding of the random-walk prior is that it is consistent with prior knowledge stating that the different spline-basis-coefficients change slightly with a tendency for neighbouring coefficients being equal (in average). Thus this would be a nice way to start from Weibull and explore continuous deviations from it, while allowing for much more flexibility due to many more knots being possible.

I’m absolutely happy to collaborate with people who are willing to contribute to survHE. My main objective there is to aid in the task of performing survival analysis to then feed a wider health economic model, but the survival analysis can also be used as a standalone part…

I did implement (a few versions of) Royston-Parmar models in the current version of survHE (BTW, in addition to the version on CRAN, see the development one at https://github.com/giabaio/survHE). Not to say that more efficient clever implementations can’t be made and I’d be happy to discuss!

I’ve thought a lot about this hazard extrapolation problem, and I think it’s worth treading very carefully. The usual “predictions” that one gets from a frequentist Cox model (using an estimator of the baseline cumulative hazard) are not survival times but \hat{S}(t) for various t. The traditional approach to extrapolation is that \hat{S}(t) past the last event time is said to be constant (i.e., hazard is exactly zero). This assumption is both (1) extremely stupid and (2) obvious to the end user. Any other extrapolation method is certainly going to win out on point (1) but lose on point (2). At least for the default behaviors, I favor stupidity in an obvious way over cleverness in an opaque way. For example, I can think of applications where the hazard would not increase rapidly beyond the last observation time (e.g., time to surgery-related infection, where very few infections occur after the end of study), making cubic extrapolation a poor choice.

With respect to treating censored event times as parameters, I see it like any other missing data problem (except maybe more difficult because we’re only observing part of the event time distribution and we may not want to make strong assumptions about the rest of its shape). Putting the unknown event times in the generated quantities block reduces the dimension of the parameter space explored by the sampler, right? Why have more parameters than you need if imputation in the generated quantities block works just as well? You’ll need to assume the form of the hazard function for all t>0 either way. Am I missing something?

Yeah it’s almost like you need a model appropriate to the situation… :) +1 to not sampling event times with HMC, in some ecological models the distribution is so strongly seasonal it periodically hits zero.

Yes, that’s a good point. I guess as @ermeel pointed out we can play with the boundary extrapolation by changing the order of the I-Splines. Also for that type of problem where a certain number of patients will never experience the event of interest within the considered time frame a lot of people have considered Cure Models.

Yes, you’re right. It would be better to just compute the death times in the generated quantities block. I’m a little rusty on survival analysis, but I guess the whole point of Cox models is that you can deal with censored data without having to make assumptions about the distribution of when the patient “would have” died. I was just making the point that there are reasons to do the spline computation in Stan instead of precomputing outside of Stan.

Just want to note that my former students published papers in Statistics in Medicine developing restricted cubic splines in the hazard space (and not log hazard) along with step function time-dependent covariates, getting the analytic form of the cumulative hazard with these time-dependent covariates. Their models worked well, with interruption of MLE iterations if a hazard point was starting to go negative, and using step-halving at that point. Royston did not reference these papers in theirs.

Thanks for mentioning. Could you list the actual references?

An alternative for step-function-time-dependent covariates would be the extended Cox Model (Poisson model equivalent to a piecewise constant proportional hazard Cox model). The advantage I see here is that it does not have the problem with negative likelihoods. The disadvantage is the somewhat more involved data preparation. Of course piecewise constant hazards could be made more flexible with increasing knots and putting a random walk prior over the hazard values. Of course this looses the continuity to some extend given when using (restricted cubic) splines, but I would question whether this is actually such a big problem, when working on hazard scale…

I think the problem with piecewise flat hazards is the extreme sensitivity to choice of cutpoints in time, which if properly reflected in the posterior makes the posterior less sharp. You might also look at adaptive linear spline hazard models: http://data.vanderbilt.edu/fh/Papers/survival/koo95haz.pdf