Projpred interpretation

Hi all,

I am using projpred to reduce my predictor set. One of my colleagues raised a concern that I would like to get some expert opinion on.

The concern is about correlated predictors and the risk of over-interpreting selected variables mechanistically or causally. In a previous PCA-based workflow, shared information among correlated predictors was distributed across components, which naturally encouraged cautious interpretation. In contrast, projection predictive selection returns a sparse model with individual predictors, which can create the impression that the selected variable is uniquely important, even when several correlated predictors may carry largely interchangeable predictive information.

More specifically, if two predictors are strongly correlated, the fact that projpred selects one over the other could partly reflect the forward search procedure and predictive redundancy, rather than evidence for a uniquely causal or mechanistic role. A reviewer could argue that replacing the selected predictor with a correlated alternative might yield very similar predictive performance while suggesting a different interpretation.

Therefore, I am interested in how users of projpred justify the interpretation of selected predictors in the presence of substantial collinearity, and whether there are recommended ways to frame or supplement the analysis to avoid overstatement. I do start from a broader predictor set, all of which may have a mechanistic link to the response.

Thanks in advance for any orientations!

Hi @ViktorVdV,

Projection predictive feature selection was developed for prediction tasks (see, e.g., Vehtari and Ojanen, 2012, and Piironen et al., 2020). Thus, it aims at minimal subset feature selection problems, not complete feature selection problems (this is nicely explained in Piironen et al., 2020, but also in Pavone et al., 2022, for example). Hence, projpred does not try to find all predictors related to the outcome, but only a subset of the predictors that is as small as possible but still achieves a predictive performance that is as good as possible.

Does that help?

Thank you, this is indeed very helpful.