Dear Stan team,
I’m a non-statistician newbie and have used a k-fold model comparison scheme based on Vehtari et al (2016). I just wanted to check that what follows is correct (especially what I do with the posteriors once Stan has estimated them).
I am trying to predict survival from brain cancer (n=70; of which n=40 censored) using Peltola’s/Vehtari’s freely available weibull survival model (single-group model with horseshoe priors):
http://becs.aalto.fi/en/research/bayes/diabcvd/ - thanks so much for this!
The Stan model is:
yobs ~ weibull(alpha, exp(-(mu + Xobs_bg * beta_bg + Xobs_biom * beta_biom)/alpha));
increment_log_prob(weibull_ccdf_log(ycen, alpha, exp(-(mu + Xcen_bg * beta_bg + Xcen_biom * beta_biom)/alpha)));
I have a number of models with different numbers of imaging biomarker covariates (Xobs_biom) and wanted to see which model was the “best one”. I have repeated each model k-times with each model trained on 8/9th of the n=30 uncensored survival data (and all of the n = 40 censored data) and used to predict the remaining 1/9th of the deaths.
The out-of-sample predictive score used is the computed log pointwise predictive density which I have calculated outside of Stan in Matlab as follows:
For each out-of-sample subject:
- Evaluate weibullpdf(alpha, exp(-(mu + Xobs_bg * beta_bg + Xobs_biom * beta_biom)/alpha))
with the covariate values (Xobs_bg and Xobs_biom) for that subject, at time t, where t = actual time of death. - Repeat this for all posterior samples of mu and alpha. Take the mean and then log this.
Following k-fold validation, each model, gives n=30 numbers (one for each death) which I sum to produce the ‘out-of-sample marginal log-likelihood’ aka ‘log evidence’ of the model (a negative value). The model with the highest value wins (assuming all models have equal prior probabilities, difference of 3 is good evidence). I have attached a sample result.
Ps. I have done the las bit outside of Stan (instead of using generated_quantities) both for my own education and also because all the models will take ages to re-run.
Thanks so much!