How to assess discrimination and calibration from a brms survival model

I have a question about survival modeling with brms. Is there any standard way, e.g., some function or algorithm, to evaluate the discrimination and calibration of brms survival models, analogous to the frequentist models? For example, for frequestist survival models I can calculate a bootstrapped c-index and also plot calibration . How do bayesians do this?

1 Like

Sorry, don’t have time to answer this, but maybe @sakrejda is not busy and can answer?

Maybe also @sambrilleman and @ermeel can chime in?

1 Like

In rstanarm (see rstanarm::ps_check) we have simple method for checking calibration that is based on plotting the predicted survival curve against the Kaplan-Meier (observed survival curve), possibly stratified by covariate profile(s). I don’t know if that is possible in brms or not; it would depend on what kind of predicted survival probabilities you can get out of brms.

In our StanCon 2019 notebook, @ermeel showed how to calculate RMST and a time-dependent Brier score. It takes a little bit of manual work, but not too much. You just have to generate an array with the predicted survival probability at each MCMC draw of the parameters – if you can get that from brms then you can use the code from the notebook to get RMST and/or Brier score. The RMST can be compared to a non-parametric estimate of the RMST. The Brier score is a measure of prediction error (i.e. calibration).

We’ve not yet implemented discrimination measures for survival models in rstanarm, but hopefully that can happen in the future. Again, if you can get predicted survival probabilities at each draw of the MCMC parameters then you could calculate something like AUC manually. I’ve played around with this for joint models, but not yet implemented it for standard survival models in rstanarm.

Sorry that there aren’t any convenience functions that I know of! (except ps_check in rstanarm), but someone else may be able to point you in the right direction.


@paul.buerkner I hope you don’t mind me tagging you in here - I notice the thread does not have the brms tag. Question/Discussion above about calculating discrimination and calibration metrics for survival models iin brms. I’m interested in these questions too, so I guess I’ll pose two questions:

  1. Does brms have any existing functions for calculating AUC / calibration etc from survival models ?
  2. If not, can brms survival models be used to generate predictions for survival at some user defined follow-up time? If we have that users could implement their own version of those metrics similar to @sambrilleman 's code for joint models.

I tried to construct some discrepancy metrics for Bayesian time-to-event models to detect misspecification of my parametric model, and this was one approach I landed on. For key time points t, I compared my model-based posterior predictions to the KM estimates within subgroups of interest (i.e., control and treated). As a warning, none of the metrics I looked at were particularly sensitive to mild or moderate misspecification of the types I considered.


hi, sorry for the slow reply, in addition to seconding what @sambrilleman said (calibration against the entire K-M curve), I’ll add that the calibration as a function of time is going to be auto-correlated so the only thing I’d add is to use changes in the observed (K-M) versus modeled curve as the summaries. Survival data is not that informative so you won’t catch mild mis-specification as @lcomm points out.