Projpred: Behavior of argument search_terms

The new projpred versions (>= 2.0.0) have gained argument search_terms in cv_varsel() and varsel(). My question concerns the terms which are not included in search_terms: Are they always included? For example, consider the reference model formula y ~ (1 | subject) + x1 + x2 + x3. If I then specify search_terms = c("x2", "x3"), does that mean that y ~ (1 | subject) + x1 is taken as the “baseline” model whose terms are always included during the search process? Or is that “baseline” model given by y ~ 1, meaning that the terms (1 | subject) and x1 are excluded?

Perhaps @AlejandroCatalina can help with this?

If you specify search_terms = c("x2", "x3"), then the terms (1 | subject) and x1 are excluded. To put things simply, the search would only consider those terms in search_terms iteratively until all of them are included in the projection.

I am nonetheless working on improving the documentation for this parameter and custom reference models, so I will hopefully upload couple new vignettes soon.

Thanks for the question!

1 Like

Sorry, but I still don’t really understand: When you say

then the terms (1 | subject) and x1 are excluded

do you just mean “excluded from the list of candidate predictors” or “excluded from the list of candidate predictors and also excluded from the candidate models”?

Excluded from the list of candidate terms and from the candidate models, as the candidate models are built from an intercept model to a model including terms contained in search_terms. Basically, search_terms defines the space of submodels to explore.

1 Like

I just checked some basic usage of this argument and I want to add to the previous response that, if specified, serach_terms must include the intercept, so the proper syntax for the above example would be search_terms = c("1", "x2", "x3"). I’m working on adding an argument that is specifically designed for fixing terms across the submodels, but for now we’ll have to work with search_terms only.

1 Like