K-fold predictive validation of survival (Matlab Interface)

avehtari · August 21, 2017, 8:11pm

Great that you have found our code useful!

That seems like quite a challenging data.

Note that this code dosn’t have our new recommendation for setting the horseshoe hyperparameters [1707.01694] Sparsity information and regularization in the horseshoe and other shrinkage priors

Can you tell how many? I’m asking to get a better feeling whether horseshoe is useful and whether you might need regularized horseshoe ([1707.01694] Sparsity information and regularization in the horseshoe and other shrinkage priors) in this case. As you have quite many censored observations, and censored observations are like binary classification observations, you may need regularized horseshoe if the number of covariates is larger than the number of observations.

I think you should include censored observations in your cross-validation.

Correct. You could also compute for the censored cases
weibulldccdf=1-weibullcdf(…) with same parameters and with the censoring time.

It’s not ¨log evidence’. It could be called ‘log pseudo-evidence’ following Geisser and Eddy (1979).

It’s not evidence and you should not combine it with model prior probabilities. Although you could call it pseudo-evidence and use the same scale as for the evidence, it seems that it’s not so good approach and you should instead take into account also the uncertainty in the difference (Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC | Statistics and Computing, [1507.04544] Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC). Further evidence on the importance of taking into account the uncertainty can be found in our paper on combining models ([1704.02030] Using stacking to average Bayesian predictive distributions), where we compare also Pseudo-BMA (based on pseudo-evidence differences) and Pseudo-BMA+ (based on using both the difference and the uncertainty), and the latter gives clearly better results.

Unfortunately there is also some challenges in interpreting the difference and the related uncertainty as discussed in my other answer today Interpreting elpd_diff - loo package - #4 by avehtari

Note also that if you are using cross-validation to compare many models, you may overfit in the selection process as shown, e.g. in Comparison of Bayesian predictive methods for model selection | Statistics and Computing

For a survival models you might also want to compute something like Harrell’s C (http://onlinelibrary.wiley.com/doi/10.1002/sim.4026/abstract), which might have more easily interpretable scale.

Topic		Replies	Views
Survival models with multiple censoring methods Modeling	3	130	July 31, 2024
Weibull Survival model for strongly censored data - Stan does not recover sample distribution values Modeling	2	1243	May 11, 2019
Competing risks survival models Modeling	4	1330	April 25, 2018
Survival data modeling using mixture priors Modeling rstan , mixture , survival	2	63	December 30, 2024
K-fold cross validation in cmdstanr - extracting loglikelihood from generated quantities CmdStan	2	1096	March 13, 2022

K-fold predictive validation of survival (Matlab Interface)

Related topics