Can I use projpred for this exploratory approach?


I have not yet used projpred in practice but I heard of it. As I am writing a preregistration, I wanted to quickly ask here whether it can be used for the following purpose:

We collected y 3 times and we have 9 “candidate” predictor variables. There predictors include categorical, interval scaled and ordinal variables. Additionally, we have 3 confounding variables that we know by theory that they have to be included. We want to answer the question:

Which of these 9 variables (if any) can help to predict y? It is an exploratory study that should identify predictors of y. This should help researchers build theory and hypotheses in the future as it is an unexplored question.

Is projpred suitable for that? If not, can you recommend me an alternative approach? Alternatively, I would calculate LOOCV for many models. However, it would be much more work as with 9 variables, there are many different combinations that are possible and it might be that only certain combinations of variables work together, e.g. as in masking effects of regression models. However, if I did that, then I also might be able to identify variables that clearly improve out of sample prediction. Does projpred also consider different combinations of predictors?

If anyone who knows this could help me out here would be great. @paul.buerkner knows for sure so I hope this message might find him!


1 Like

I think this is a perfect example where projpred is justified. Not yet saying anything about causality, just about the predictive information and helping to decide what to measure in the future.

Also LOO-CV has higher variability as shown in projpred related papers.

Yes. You can also fix that some predictors are always included if that is needed.


Thanks so much Aki! That makes me confident that I can use it for this purpose!