Usage and Interpreation of cv_varsel-function

@fweber144 I wanted to thank you for the informative video you shared about how to use the projpred package. I found it very helpful. Moreover I did read the following homage, which includes an example and R-Code: Projection predictive variable selection – A review and recommendations for the practicing statistician
However, I have a few questions that I was hoping you could clarify:

  1. cv_varsel and Plot Function: Using cv_varsel and the corresponding plot function, the line under the graphic provides information about the corresponding predictor from the full-data predictor ranking and the corresponding main diagonal element from the CV ranking proportions matrix. I don’t quite understand what this information conveys. My initial thought is that with a fixed number of variables, the best model is identified through projection, ensuring it includes this specific number of variables. Using cross-validation (CV), this process is repeated across different datasets, each missing one observation. The output then shows the proportion of times each variable was selected across these datasets. Could you please confirm if this is correct or provide further clarification?
  2. Interactions in Default Search: Are interactions included in the default search when using cv_varsel?
  3. Final Model in rstanarm: For the final model, if I select, for example, five predictors, is there any issue with estimating the final model with these five predictors in rstanarm? I am considering this approach to see if I can improve predictive accuracy by choosing a more flexible functional form.

Thank you in advance for your help and guidance.

Hi Mariana,

Thank you; I hope can clarify these questions.

  1. Yes, your description is essentially correct. Just to formulate a few things more precisely:
    • With a fixed number of variables (which is not possible in case of L1 search, btw), the best model is identified by first projecting the reference model onto each candidate model of that size and then selecting the candidate model with minimal “expected Kullback-Leibler divergence” from the reference model (i.e., selecting the candidate model with minimal \delta_{\mathrm{KL}}^{\mathbb{E}} = \sum_{c = 1}^{C} w_{c} \frac{1}{N} \sum_{i = 1}^{N} D_{\mathrm{KL}}\!\left( p(\tilde{y}_i|\mathcal{I}^{*}_{c}) \:\middle\|\: p(\tilde{y}_i|\boldsymbol{\theta}_{c}) \right), using the notation from Projection predictive variable selection for discrete response families with finite support | Computational Statistics and letting w_{c} denote the weight of cluster c).

    • Your statement

      Using cross-validation (CV), this process is repeated across different datasets, each missing one observation.

      is correct if “this process” means the whole search procedure (i.e., from model size 0 up to model size nterms_max), not the selection of the best model at a fixed number of predictors. Btw, “each missing one observation” refers to LOO CV, but there is also K-fold CV in projpred. You were probably aware of that, but I want to highlight this for other readers.

    • In

      The output then shows the proportion of times each variable was selected across these datasets.

      I would add “at this specific position in the predictor ranking” at the end. But this only holds for cumulate = FALSE (see Plot predictive performance — plot.vsel • projpred). In case of cumulate = TRUE, this would have to be changed to “up to this specific position in the predictor ranking” (the bottom line of the x-axis label would then also mention “cumulated” in front of “CV ranking proportions matrix”).

  2. Interactions are included in the default search if they are included in the reference model formula. (More precisely, interactions are included in the default search if they are included in argument formula of init_refmodel(), but since this usually coincides with the reference model formula (users rarely call init_refmodel() themselves), I simplified this to the reference model formula.)
  3. In general, this is undesired as the projected posterior for a given submodel (in particular, for the final submodel) retains uncertainty from the reference model’s posterior. This would be lost when working with the posterior of a refitted (final) submodel. Only when performing post-selection inference using the projected posterior of the final submodel, you will get valid post-selection inference (more precisely, we would have to say “approximately valid” because minor overfitting could occur in the selection of the final submodel size). In your case, if you are concerned about the functional form, you could consider projpred’s experimental support for GAMs (or GAMMs, if you also have a multilevel structure). For this, see the additive models in projpred: Projection predictive feature selection • projpred.
1 Like

Dear Frank,

Thank you for the helpful clarifications. I still have a few questions concerning the allowed functional forms in the reference model. You mentioned that it can include interactions; is it also possible to use the poly function to add quadratic terms? Furthermore, if I add interactions or quadratic terms, are the main effects or linear parts automatically included if the interaction or quadratic term is included?

Additionally, how should interactions be included in the formula? Should I use the interaction function, or simply use * between the terms?

Best regards, Mariana

I’ll start with the last question:

Additionally, how should interactions be included in the formula? Should I use the interaction function, or simply use * between the terms?

Simply use *. You can also use :, but then it usually makes sense to include all lower-order interaction terms (including the involved main effect terms) as well (* does this automatically).

is it also possible to use the poly function to add quadratic terms?

Yes, it is. There have been some issues with such terms, but they should be fixed, see PR #409.

Furthermore, if I add interactions or quadratic terms, are the main effects or linear parts automatically included if the interaction or quadratic term is included?

There are two possibilities what you could mean by “included”:

  • “included” in the reference model formula (more precisely, argument formula of init_refmodel(), see my last post),
  • “included” in a submodel formula (during projpred’s search).

If “included” relates to the reference model formula, this is a general question about such formula terms in R, but I hope I can answer it accurately:

A poly() term always includes all lower-degree terms (see ?poly). This is not the case when constructing the degree-wise terms manually (e.g., I(x^2)). For interactions, see my comment on * and : above.

If “included” relates to projpred’s search, note the following on poly() terms from the NEWS entry corresponding to PR #409:

Note that just like step() and MASS::stepAIC(), projpred’s search algorithms do not split up a poly() or polym() term into its lower-degree polynomial terms (which would be helpful, for example, if the linear part of a poly() term with degrees = 2 was relevant but the quadratic part not). Such a split-up of a poly() or polym() term needs to be performed manually (if desired).

So a poly() term (including all lower-degree terms) is either fully included or fully excluded during the search. You would need to construct the degree-wise terms manually if you want to allow them to be included / excluded separately. Manually constructed degree-wise terms (e.g., I(x^2)) are handled by projpred just like any other formula terms. For an interaction term, projpred ensures that all of its lower-order interaction terms (including the involved main effect terms) are selected first (in other words, an interaction term is only considered for selection if all of its lower-order interaction—or main effect—terms have already been selected).

1 Like

Thank you very much!

1 Like