Survival models in rstanarm

Yeah on the rstanarm GitHub repo there is the feature/survival branch and the feature/frailty-models branch.

The feature/survival one should be good to go and fits a bunch of right censored survival models. I can’t remember how many extra features I added to that branch.

The feature/frailty-models branch contains an additional bunch of extensions I’ve been working on more recently: left/right/interval censored data, delayed entry, time-varying effects, time-varying covariates, hierarchical survival models, etc. I can’t remember exactly which of these features were already on the feature/survival branch but perhaps some. I’ve not added anything related to competing risks or multi-state tho (and doubt I will get the time to either).

I’ll merge the feature/frailty-models branch into the feature/survival branch in the near future I think (I had just wanted to test it / finalise it a bit more).

There is a pretty detailed vignette here, but you need to install the feature/frailty-models branch to compile the vignette, so it’s not that easy to view at the moment. Maybe I will upload it to my website and then post a link here, just until the vignette is actually available on CRAN.

The approach we took was M-splines on the baseline hazard. So it’s not exactly the Cox-equivalent @Emmanuel_Charpentier is after. But it is more parsimonious and requires less data wrangling. One potential downside is that it requires quadrature for time-varying effects (i.e. time-varying hazard ratios), where as the poisson time-splitting method doesn’t and so may be faster for that specific use case. Note that we implement the M-splines using the splines2 package, which allows piecewise constant as a special case when the M-splines degree is set equal to 0. So we effectively get piecewise constant baseline hazards implemented for free. Using a large number of knots should get you close to the Poisson time-splitting Cox model approximation (in theory the Cox model occurs once the knots are at each unique event time).

Yeah this is correct. If you want the observational unit to the be the individual, then you have to calculate the likelihood contribution for each row of data (allowing for delayed entry) and then collapse (i.e. sum) within each individual. This is how the loo method for rstanarm::stan_jm works.

I’ve not yet bothered with that for rstanarm::stan_surv. The log_lik and loo methods for stansurv objects just operate on each row of data but allow for delayed entry. It’s currently up to the user to decide whether the rows of data are meaningful units of observation. If they aren’t, then I guess they could use the log_lik method, then manually collapse rows (i.e. sum) within each individual, and then pass the collapsed matrix to loo::loo.matrix() for evaluating loo/WAIC. I guess that eventually it might make sense to have an idvar argument and then collapse the log likelihood across unique values of idvar just like stan_jm does.

The most common prediction quantities would be the hazard, cumulative hazard, or survival probability (and their log variants). In rstanarm they are implemented through the rstanarm::posterior_survfit function (as opposed to rstanarm::posterior_predict). At the moment this function will throw a stop if the user tries to predict with the original data that contained start/stop notation – for exactly the reason you highlight (i.e. we probably only want a prediction for each individual, not each row).

The way to circumvent the issue is either pass newdata with one row per individual (it’s very unlikely the user actually wants to predict using multiple rows of covariate data for each individual! That would only be necessary for time-varying covariates in the prediction data, which would be a nightmare to interpret). If the reason that the user was trying to predict with start/stop data is because they want to predict under the assumption of delated entry, then the last known survival time should be entered explicitly via the last_time argument and NOT via the prediction data.

Hope this helps a bit. Of course it’s all related to rstanarm and not Stan proper, but it should give you some pointers incase you plan to implement something similar using your own Stan code.

As Jonah mentioned, @ermeel will present a bunch of this stuff and related concepts at StanCon this year. We also plan to publish a notebook as part of that StanCon talk.

Just to follow on from my previous message. I’ve managed to upload the current (compiled) vignette for the feature/frailty-models branch, see https://www.sambrilleman.com/vignettes/stan_surv. It’s a static object and I will probably pull it down once this version of rstanarm is up on CRAN. But it gives you an idea of where we are at anyway.

loo package works just fine for individuals after log_lik values for each individual are summed together, as mentioned by @sambrilleman. BF corresponds to leave-all-data-out and thus doesn’t make difference between observations and individuals.

btw. Cox didn’t use model (temporal prior) for baseline hazard due to the computational reasons, but he did write that better results would be obtained by using a model for baseline hazard.

Yeah, I especially like this quote that appeared on one of the slides in Paul Lambert’s course and was originally taken from this conversation between David Cox and Nancy Reid:

Reid “What do you think of the cottage industry that’s grown up
around [the Cox model]?”

Cox “In the light of further results one knows since, I think I
would normally want to tackle the problem parametrically.
. . . I’m not keen on non-parametric formulations normally.”

Reid “So if you had a set of censored survival data today, you
might rather fit a parametric model, even though there was
a feeling among the medical statisticians that that wasn’t
quite right.”

Cox “That’s right, but since then various people have shown that
the answers are very insensitive to the parametric
formulation of the underlying distribution. And if you want
to do things like predict the outcome for a particular patient,
it’s much more convenient to do that parametrically.”

… Even the pioneer of semi-parametric methods himself appears to be pro-parametric!

6 Likes

Hi I just wanted to mention that if you think a feature strata() would be available or easy to implement? This is the reference of the survival function strata:

That basically fits a model with startified baseline (by a factor covariate) is useful in some cases where the proportional hazard does not hold and is what the mstate package is using for fitting multi-state models. I myself have been playing around that idea for baseline covariate models based on your work on the feature/survival branch.
https://github.com/csetraynor/mctte/blob/master/src/stan_files/mctte.stan
Although honestly I think that it can be done much more efficiently (I wrote this model 6 months ago), and I could not make it work for hierarchical shrinkage priors at the time… etc. I would like to collaborate with the feature strata if you tell me that the feature strata will not be included in the release of the feature/survival branch nor feature/frailty branch.

This is a perennial question and the usual advice I hear from Andrew is not to be overly concerned with the parametric form under the side condition that it’s not too restrictive. So don’t sweat logistic vs. normal distributions (e.g., probit vs. logit), but do sweat zero-avoiding priors (lognormal vs. half normal).

1 Like

Is this actually also applicable to Bayesian analyses? Are there any (simulation) works that study how sensitive (marginal) posteriors (say of hazard ratios) are to the choice of the parametric baseline model (assuming proportional hazards for now)? In particular what happens when there is particular strong time dependence in the “true”/generating (baseline)-hazard but one imposes a constant baseline hazard in the model used in the Bayesian analysis? E.g. could this lead to biases or even potentially multi-modal posteriors for some parameters, effectively trying to compromise between different (constant) baseline hazards?

I guess @sambrilleman one could come up with a simulation study using your simsurv R-package, but would like to check before whether anything is known already.

@ermeel Yeah I think Cox is just referring to the hazard ratios in particular (which of course is all that you get from the Cox model anyway). So the actual fit of the model (e.g. how predicted survival compares to observed survival) can be dramatically affected by how you model the baseline hazard. But the estimates of the hazard ratios in particular are pretty similar regardless of how you model the baseline hazard.

I have no idea what is out there in the literature related to this (classical or Bayesian). But, yeah if there isn’t much out there, then it could be something for someone to explore. Yeah, you could easily use simsurv to simulate the data and then fit the competing models using Stan.

1 Like

@csetraynor Sorry I don’t mean to ignore this. I just wasn’t sure how to respond and suggest a way to collaborate on it effectively and/or fit it into my own workflow.

FWIW I hadn’t yet planned a strata function, but yeah, stratified models would be worthwhile implementing. From memory, a so-called stratified proportional hazards model means a different baseline hazard in each strata, but the regression coefficients are common across all strata? I guess this could be a strata() function to be used in the model formula (like the survival package), or a separate argument to stan_surv().

The multi-state models on the other hand would be a much bigger undertaking and, yeah, similar to what you have done in your mctte code I guess where everything has to become lists (for specifying arguments of stan_surv()) and loops (in the surv.stan file) to incorporate the multiple submodels. This would extend the functionality of stan_surv(), but also add to the length and complexity of the underlying surv.stan code. I guess such extensions would have to happen eventually though (at a minimum, allowing for cause-specific hazard models for competing risks, but full blown multi-state models would be even better). I’m just a bit unsure about heading down that path any time soon though, since stan_surv() still needs to get up onto CRAN and get a bit of field testing first!

Hi Sam,
Thanks for your feedback, yes, the code in mctte stan was my first approach to multi-state modelling and unfortunately is bugg prone right now, I am planning to start (almost) from scratch in the following weeks. For the multi-state model, recently, a colleague suggested me to use the Kolmogorov equations instead.
In practice it allows to not repeat the censoring data for each transition hazard. Unfortunatelly, the use of lists and ragged data structures will be unavoidable but I am planning to read on how are the Torsten team doing it for mixed-effects models to learn how to do that in the best way.
I would approach it that way, but yes I am not in any case saying that there is a preasure need to include this in you branch stan_surv.

Awesome! This sounds really cool. Do keep us updated with progress! I think multi-state functionality would get quite a bit of use…

Any update in the use of Bayesian flexible models? Is it possible to implement a bayesian Royston-Parmar model?

You can in survHE: see https://github.com/giabaio/survHE/blob/master/inst/doc/survHE.pdf.

1 Like

Is there any estimate of when this could happen? It would be great to get the stan_surv functionality to CRAN!

Rps models are implemented in survHE calling stan remotely and with the possibility of saving the resulting code as a model template.

A small question, how can I get the posterior draws on survival probabilities using function posterior_survfit?
I understand it provides summary measures (median, ci, etc.) for posterior draws, but I’m more interested in those draws themselves.
Like what I can get from posterior_predict.

If this is about stan_surv try using the argument return_matrix=TRUE.

C.f.

3 Likes

Thank you. very helpful information.

1 Like

Hi,
Can you give me some more detail of the issue? Perhaps you’d like to open an issue on survHE’s GitHub (assuming your problem is with survHE)?

G

Thanks for the reply. I found the problem, these is one patient who has two different longitudinal measurements at the same time. Like

id  time  Y
23  3.4  15
23  3.4  16

I deleted one measurement and it is fine now.
I withdrawn the post because it is data problem, not relating to the function.
Thanks anyway.