Model diagnosis / comparison for survival models

jackinovik · August 31, 2018, 10:08am

Not sure if others have experience with this, but I’ve been using the time-dependent Brier score, specifically a variation of it using a log loss. It’s summarized well in Graf et al, 1999. Has the limitation that it uses a point estimate of survival probability at a particular survival time, but perhaps you could summarize this over the posterior distribution to get a sense of calibration.

In this formulation, the censored survival times contribute to the score directly up until the time they are censored, and contribute indirectly after that by impacting the weights assigned to observed & non-observed events. It’s not ideal but unless you model the censoring distribution directly, it seems like the best one can do.

One concern I have when using predicted survival times is that the posterior distribution of this predicted value can be very wide (often too wide to be clinically useful!), so this measure may not be sensitive to problems with model fit.

ermeel · September 25, 2018, 4:44am

Thanks for the Brier score reference. For reference, I also list “Consistent Estimation of the Expected Brier Score in General Survival Models with Right-Censored Event Times” by Gerds et al. with the accompanied R package pec.

In survival analysis with censored data the mean squared error of prediction can be estimated by weighted averages of time-dependent residuals. Graf et al. (1999) suggested a robust weighting scheme based on the assumption that the censoring mechanism is independent of the covariates. We show consistency of the estimator. Furthermore, we show that a modified version of this estimator is consistent even when censoring and event times are only conditionally independent given the covariates. The modified estimators are derived on the basis of regression models for the censoring distribution. A simulation study and a real data example illustrate the results.

and

We emphasise that […] all these estimators are robust against misspecification of the survival model.

This is provides an estimator with weaker assumptions for consistency, however requires us to model the censoring time distribution.

If I understand correctly, the potential bias of the estimator now does not depend anymore on the survival model but on the model used for censoring, which is much better if you use the brier score to compare multiple survival models and when use the same underlying model for censoring times.

ermeel · September 25, 2018, 5:00am

Thanks Jacki. If I wanted to include this aspect into my model comparison, might a potential way to check for this aspect to work with posterior predictive checks (for the uncensored cases only) and (graphically) check for sharpness?

For two well calibrated models, the one with the narrower posterior intervals is preferable because its predictions are more tighter. The term introduced for this by Gneiting et al. (2007) is “sharpness.” In the limit, a perfect model would provide a delta function at the true answer with a vanishing posterior interval.

jackinovik · October 10, 2018, 3:24pm

Thanks @ermeel - this sharpness attribute sounds like a useful metric, among well-calibrated models. Particularly when the intended use case involves the predicted survival times.

ermeel · November 11, 2018, 8:19am

Hi Aki, @avehtari,

Model for censored times is equivalent to binary classification and you can use the usual calibration plot for the probabilities.

I know it’s a while ago (we even spoke briefly at StanCon), but could you elaborate more on this?

Or did you have the above brier score in mind?

Can you post Stan code?

Here is some Stan code (simplest version I could think of):

data {
    int<lower=0> N_uncensored;                                      
    int<lower=0> N_censored;                                        
    int<lower=1> NC;                                                
    matrix[N_censored,NC] X_censored;                               
    matrix[N_uncensored,NC] X_uncensored;                           
    vector<lower=0>[N_censored] times_censored;                          
    vector<lower=0>[N_uncensored] times_uncensored;
}
parameters {
    vector[NC] betas;                                     
    real intercept;                                 
}
model {
    betas ~ normal(0,2);                                                            
    intercept   ~ normal(-5,2);                                                     
    target += exponential_lpdf(times_uncensored | exp(intercept+X_uncensored*betas)); 
    target += exponential_lccdf(times_censored  | exp(intercept+X_censored*betas));
}
generated quantities {
    vector[N_uncensored] times_uncensored_sampled;
    {
        real tmp;
        real max_time;
        real max_time_censored;
        max_time = max(times_uncensored);
        max_time_censored = max(times_censored);
        if(max_time_censored > max_time) max_time = max_time_censored;
        
        for(i in 1:N_uncensored) {
            tmp= max_time + 1; 
            while(tmp > max_time) {
                tmp = exponential_rng(exp(intercept+X_uncensored[i,]*betas));
            }
            times_uncensored_sampled[i] = tmp;
        }
    }
}

avehtari · November 11, 2018, 11:19am

I think what I had in my mind would work only for censored at random. Then the distribution of CCDF’s should be uniform. But if the censoring is not random then the censoring process should be modelled, too.

Topic		Replies	Views
Competing risks survival models Modeling	4	1290	April 25, 2018
Stan for survival models Modeling	26	12063	July 18, 2018
Survival models with multiple censoring methods Modeling	3	84	July 31, 2024
K-fold predictive validation of survival (Matlab Interface) Modeling loo	17	2341	September 5, 2017
Survival model for readout/interval data Modeling	1	308	September 1, 2020

Model diagnosis / comparison for survival models

Related topics