This is a pretty general question. I’ve seen very sparse discussion/examples about survival models using Stan. Any pointers would be greatly appreciated. Thanks!

Jun Xu, PhD
Professor
Department of Sociology
Ball State University
Muncie, IN 47306

Hi @junxu2018, there’s definitely not (yet) as much as we would like, but there’s still quite a lot of good stuff you can find searching around online. For example:

In addition to the Stan related resources @jonah listed, I recently came across this great explanation of how a proportional hazard model with piecewise constant hazard function is equivalent to a specifically constructed Poisson regression model, which in turn can be modelled easily with brms or rstanarm.

This is also how INLA (with Markovian priors) and GPstuff (with GP priors) do it. Usually the hazard is so slowly changing that piecewise constant is a good approximation without need for high number of pieces and thus computation cost stays low.

One problem with this approach for me is that I do see a problem with the required dataset one has to construct, since one has a datapoint for each individual for each observed time. I doesn’t scale well computationally…

For instance, i have a scenario with around 200k individuals with about 2000 unique times. How to do a Poisson regression here efficiently on standard hardware? Would a sufficient statistic help?

I was wondering whether one could also use a small set of manually specified reference times in practise? Has anyone compared the two alternatives?

In this case it would be Bernoulli model, ie, for each individual at each time interval you would have 0 events or 1 event. Poisson holds only as an approximation to Binomial if you aggregate individuals and the number of suspectible in a time timve interval N_t is high and the Poisson intensity is low ie the number of events M_t<< N_t.

If you have a proportional model, ie if you don’t have interactions between hazard in time, then the maximum complexity is 200k + 2k. It’s aunlikely you would need 2k time intervals and by coarsening the covariates and aggregating this gets is even easier. If you want to have interactions between time and other covariates then you need to coarsen a lot for Stan or use other software than Stan (at least some GP software could handle this using other inference than MCMC and sparse GP approximation)

That is the whole point of Poisson trick. There are papers analysing the relation between the smoothness of the function and how coarse the discterisation can be. Sorry can’t remember the exact references right away.

Personally I think the smooth approximations to the baseline hazard are preferable to the piecewise constant approach. In the smooth spline-based approximations there is likely to be less sensitivity to the choice of the number of df or knots. Whereas the number of discrete segments for the piecewise constant baseline hazard likely needs to be chosen by the user, is more context-specific, and the choice is likely to have a bigger influence on the results. At least that is my feeling.

@ermeel@avehtari@jonah Fyi, I abandoned the repo I was working on in my own GitHub page, and instead started a survival branch on the rstanarm repo. Perhaps worth adding this to the wiki @avehtari? It’s still work in progress, but has options for M-splines on the baseline hazard (closed form for cumulative hazard, i.e. fast) or B-splines on the log baseline hazard (uses quadrature, i.e. slow, with 15 quadrature nodes its about 20 times slower than using the M-splines). I’m going to switch the B-splines out for R&P restricted cubic splines instead though, since they are likely to be more stable I think (as shown in the simulation study in this paper).

I think once you introduce spline-based time-dependent effects, the need for quadrature becomes near unavoidable, hence why I included it as an option in the stan_surv function on that new rstanarm branch. But, one suggestion from a colleague of mine was to use the approach described in Section 2.2 of this paper where they use the R&P approach and just replace the hazard with a small value (e.g. 1E-8) when the hazard strays negative. The approach feels a little funky to me, since you are basically fudging the likelihood, but they seem to claim it works ok, so I might try it out. But I would be interested to know whether you guys think it would work in a Bayesian setting.

My experience is that it really doesn’t matter or the context is that the observations are on intervals anyway, ie observed times with accuracy of day, month, or 6 months. For example, there was a case where times were recorded with one day accuracy, but actual event was actually on a interval between the last recurrence free and the the first recurrence observation time. More accurate model would have been interval censored model, but there was no practical difference to just discretising time (http://pubs.rsna.org/doi/pdf/10.1148/radiol.13131040, http://pubs.rsna.org/doi/suppl/10.1148/radiol.13131040/suppl_file/131040appendix.pdf). Quadrature for piecewise constant is trivial even with interactions with covariates and time dependent covariates. Btw, another example where I used piecewise constant hazard has also online risk calculator http://gistrisk.com/

Thank you all for these great pointers. It’s a bit surprising that Stan hasn’t provided many tools handling survival models. Is it because of the complex data structure (censoring) that survival models usually have? For right censored, for example, exponential regression, Weibull regression, or Cox regression, I am wondering if I could have some Stan codes (using rstan) that could be readily replicated and tweaked for my own problems. Thanks a lot!

I don’t think it’s because of the censoring.
That can be handled in Stan. I think the sections in the Stan manual about censored data are pretty generalizable actually. They could be fleshed out more, but they provide a good template. Have you looked at those?

There are probably many reasons that survival modeling materials for Stan users are not as common as for other other kinds of applications. One reason is that only a few of the Stan developers have much contact with biostatistics and so we rely a lot on Stan users to share their successes/struggles on that front.

There’s no reason Stan can’t be used for survival modeling and we definitely want to do more with survival modeling going forward (as you can see from discussion in some of those recent threads about development of survival modeling R packages based on Stan, adding more survival models to rstanarm, etc.).

This is excellent news, thanks @sambrilleman. I will test it (comparing also to my versions) and come back to you. I’d be happy to collaborate / contribute, as I said before to you. Let me know if you think that works for you and where I could chip in.

What’s the reason that you apply the M-splines to the hazard itself and the B-splines to the log-hazard? Did you do the former, since the cumulative hazard then becomes nicely a linear domination of I-splines, by definition?

I’m not sure which link function for the survival function we should favor in general and whether it makes sense to give the user the possibility to choose the link function as well as the spline.

If I’m not mistaken, this would lead effectively to a rejection of a proposal of a transition that leads to a negative hazard in the MCMC sampler. Would that be related to a posterior conditioned to have non-negative hazard function…?

Thanks for the literature pointers, will check them.

Regarding then quadrature (and independent of the fact that you want to switch to RP splines): there are also expressions for intrgrated b-splines, by the way, see the splines2 package.

I think one of the main reasons might be that it is common in survival analysis for people to fit a regression model on the hazard scale (due to the popularity of the Cox model), but the outcome is observed as a time-to-event. So the approach probably doesn’t feel very familiar to people who are used to more common regression modelling frameworks like, say, a GLM(er) where applying the inverse link function to your linear predictor gets you a predicted value for the outcome.

That’d be great! I’m still pushing changes regularly, so might break things occasionally, but hopefully it will run without too many bugs from now on. I noticed the estimates from the M-splines model, when fit to the Breast Cancer data, were a bit different to the R&P or B-spline estimates, and increasing the df for the M-splines didn’t really improve things (I think this was true when running the model with your .stan file or with stan_surv) – but this only seemed to happen for that one dataset. A simulated dataset seemed to return similar estimates for the M-splines, B-splines, or R&P models, I think. Would be interested to hear what you observe.

(If you want to test using simulated data, then you could download the simsurv package).

Yeah, happy to collaborate on the code. Will just be figuring out the logistics! I’ve made a new class stansurv in rstanarm that inherits the class stanreg and have added a print and prior_summary method for those objects today. Still some other methods to add (including summary and perhaps a basehaz_summary method). For predictions, hopefully we can just use most of the structure from the posterior_survfit function from stan_jm (although the stan_jm stuff is a lot more messy and dense, but still should be a bit useful). Also things like time-dependent effects, AFT models, trying out the constrained likelihood thing… I think we could each work on aspects of those, without confusing each other too much with conflicting commits? The method you mentioned here:

could be quite cool to implement too, if possible – perhaps you could try add it to the surv.stan file I’ve started in rstanarm? In any case, this thread is probably the wrong place for this discussion! – if your keen to collaborate on the rstanarm code then we could move this discussion to GitHub. I’ll open an issue on the rstanarm repo so that we can discuss ideas there…

Observations in survival analysis are almost always quite precise, at least relative to the longest follow-up duration. Human survival studies almost always have survival in days. And I have to agree with @sambrilleman’s assessment that piecewise constant hazards cause problems for us. In surgical studies there is a very high initial risk phase that usually lasts less than a week. A linear spline log hazard function works much better for this (even better: cubic spline). The locations of the change points are too important in piecewise constant hazard models, and I’ve seen great Bayesian statisticians even not allow for uncertainty in their location when in fact locations were data-chosen.

I don’t disagree on these problems if observations are precise. I disagree on “almost always quite precise”. I can believe in the studies you have been involved it has been the case, but in the studies I have been involved it has never been true and in one case the accuracy was 3-6 months with 6 year follow-up. There are so many different types of survival analysis in the world…

Note that I never was against splines as I assumed there are cases where they are better.

Thanks also for you, being so active in this discussion.

I learned Harrell’s C less than 10 years ago, and since then I’ve been involved in only a few survival analysis studies. Recently I’ve learned from you not to use ROC, AUC, or Harrel’s C :)

It was a GIST cancer recurrence after surgery, where CT scans are made with 3-6 months intervals. The CT scan date is recorded with accuracy of one day, but the recurrence can happen anytime between the observation and the last recurrence free CT scan (or surgery). I also made an elaborate model with interval censoring, but there was no practical difference to coarse dicsretization of time.

Another my experience was with FINRISKI study. The of age study population was about 25-75, the follow-up time 10-15 years, and the events were, e.g., death and cardiovascular disease events. Time of death is recorded with 1 minute accuracy, but compared to the overall uncertainty 1 month or 6 month accuracy would probably be sufficient.

Useful comments. I’m not that interested in whether getting higher resolution data changes the 2nd decimal place of statistical estimates, but rather in (1) minimizing ties to make the statistical methods work better and (2) reproducibility. And the great example you listed in GIST I’d really call “current status data”. I do prefer interval censoring because the final results are more interpretable when you do, and more useful in prospective prediction.

The c-index is OK to use depending on the censoring distribution and as long as you only compute it once per day :-) . It’s not very good for comparing two models.