@fweber144 I wanted to thank you for the informative video you shared about how to use the projpred package. I found it very helpful. Moreover I did read the following homage, which includes an example and R-Code: Projection predictive variable selection – A review and recommendations for the practicing statistician
However, I have a few questions that I was hoping you could clarify:
- cv_varsel and Plot Function: Using
cv_varsel
and the corresponding plot function, the line under the graphic provides information about the corresponding predictor from the full-data predictor ranking and the corresponding main diagonal element from the CV ranking proportions matrix. I don’t quite understand what this information conveys. My initial thought is that with a fixed number of variables, the best model is identified through projection, ensuring it includes this specific number of variables. Using cross-validation (CV), this process is repeated across different datasets, each missing one observation. The output then shows the proportion of times each variable was selected across these datasets. Could you please confirm if this is correct or provide further clarification? - Interactions in Default Search: Are interactions included in the default search when using
cv_varsel
? - Final Model in rstanarm: For the final model, if I select, for example, five predictors, is there any issue with estimating the final model with these five predictors in
rstanarm
? I am considering this approach to see if I can improve predictive accuracy by choosing a more flexible functional form.
Thank you in advance for your help and guidance.