Hi,
This is a question related to an issue opened in projpred. I have been recently working in projects that include microarray data as covariate information. IMO vast literature on the topic but still many things to do.
I simplify the aims of my project in two:
-Model survival data from the microarray features.
-Model survival data from the microarray features and treatments (many of them possibly using a hierarchical model).
The use of shrinkage priors i.e. horseshoe, product normal is something I have already tried. Still, the results are not very easy to interpret, especially by non-experts in a shrinkage prior.
I have a plan:
Similarly to the projpred publication [https://arxiv.org/abs/1810.02406] fit a good reference model pre-conditioning and possibly trying BayesPCA also.
Use vfold cross-validation to tune the hyperparameter of a Laplace prior using the optimization routine in Stan to fit the models. Select the microarray features that have a coefficient above a threshold (for example, 1e-3?).
Project the posterior fitting the model with Gaussian priors and again using the optimization routine in Stan.
I want your suggestions on this plan if you predict it is feasible to rely on optimization that is not potentially specialized for this problem or you know any resource that could be useful for the projection that I have not found yet.
Thanks, @avehtari for your comments on the issue opened in github already.
Also tagging @jpiironen as he is developing projpred.
It would be helpful if you can say something about the number of features and observations
I don’t understand 2 and 3, can you elaborate?
He is still interested in the package, but he is not any more active in the development. For projpred development at the moment it’s best to tag me, @paul.buerkner and @AlejandroCatalina
The number of features varies from one dataset to the other ranging from 20,000 to 600.
The data points are ranging from 400 to 1,000. Treatments are also roughly 100. For the 20,000 feature dataset, I am however preconditioning using concordance index so it may be that the number of features is 500 - 1,000 depending on the threshold.
Actually, after not succeeding in getting consistent convergence using Laplace priors and optimising, I slept on it and changed my plan:
Fit a horseshoe prior model to the median projected time-to-event of the reference model.
Fit projected models using a normal prior for each sample of the reference model adding covariates in the order of decreasing coefficient size: first model 1 the variable with the largest coefficient, second model 2 variables, etc. and calculate the metric elpd, cindex etc.
Why you are not using horseshoe prior in the reference model? Instead of median projected time-to-event, it would probably be better to use the mean of the latent location parameter (also known as linear predictor).
I don’t understand this. Projection doesn’t need prior (unless you refer to small regularization to avoid numerical problems?)
Yes, you think it would perform better than the BayesPCA approach? I believe the horseshoe for the reference model is a very excellent idea and may as well compare both approaches to construct the reference model.
That would be better right? So actually then is doing the variable selection using a Gaussian likelihood estimating the mean of the latent location parameter is that it? So that got me thinking could one construct a custom model in the current structure of projpred and use the gaussian glm? EDIT No, I remember why I thought that the custom model might not work in the current structure of projpred. It is because it requires to pass the original data for computing LOO and other metrics I think. But that was close. If that approach works implementing conventional survival models in projpred is even more convenient, I may as well work in a separate branch checking that out.
Yes I know, I said it to avoid using uniform priors…, or maybe it is possible to use the optimisation routine here? Or just using inflated normal priors, I saw that in projpred the L2 regularisation is an option for this step.
BayesPCA can be faster in an case. BayesPCA is more appropriate if you assume high proportion of predictors are relevant but highly correlating. Regularized horseshoe is better if you assume only a small number predictors are relevant.
I haven’t had much time to think this specific case recently (we did think related cases some time ago) so I may be making mistakes. I would say assuming Gaussian pseudo-observations and Gaussian observation model. Working on latent space makes the target less skewed. For pseudo-observations you would mean of the latent location and the scale from the natural parameterisation. Sorry, I don’t have exact answer right now, but I can get back to this next week.