Projpred: ensuring selection of a subset of variables

I’m reading the incredibly useful body fat projpred vignette which compares the performance of projpred with that of the methods of Heinze, Wallisch, and Dunkler (2017). In the section on prediction selection stability, Heinze et al (2017) reportedly ensured that abdomen and height are always included in their models. Here, I’m wondering if a similar approach could be used when using projpred. I ask this question because in many biomedical research related to variable selection, it is of great interest to include standard/conventional/easy-to-obtain predictors such as age and gender.

I also wish to take this opportunity to thank the projpred developers for creating and sharing projpred with the rest of world!

@AlejandroCatalina

Sorry for the delay and @AlejandroCatalina has also been busy with his new job.

There is an option search_terms that can be used which terms are part of the search, and thus you can exclude the desired variables. There has been dicussion to allow also define the terms that will be always fixed.

I have collected this and similar questions here. Perhaps the connection to related threads helps until this is resolved in projpred.

1 Like

Thank you for taking time to respond to my query!. I didn’t reply quickly because I was (and am) having difficulty getting search_terms to work. Minimal example below

modbrms <- brm(mpg ~ vs + hp + wt + qsec + am + cyl + drat, 
               prior=set_prior("normal(0,3)"),  ## very arbitrary 
               family=gaussian(),
               data = mtcars)

## suppose I want projpred to always select `vs` and `drat`
st <- c("1", "vs", "vs + drat")

## cv_varsel balks with an error message r in sub["kl", i] : incorrect number of dimensions
modvarsel_vs_fwd <- 
  cv_varsel(modbrms,
            method = "forward",
            search_terms = st)  
## remove `method` arg
 modvarsel_vs_fwd <- 
  cv_varsel(modbrms,
            search_terms = st)  

## `vs` is ranked last and `drat`, 4th
modvarsel_vs_fwd$solution_terms 
[1] "wt"   "cyl"  "hp"   "drat" "am"   "vs"   "qsec"

Again, thank you for taking time to provide guidance!

Thank you for collating all threads related to search_terms - very useful!

I’d like to bump this, since I have the same question @AlejandroCatalina.

@yhpua, the reason why you don’t see any impact of argument search_terms when omitting argument method is that the search_terms argument only takes effect in case of a forward search and in your case, the default of method = NULL leads to L1 search. This seems to happen frequently, see here (remedy is on the way).

Concerning the error sub["kl", i] : incorrect number of dimensions: Could you try this again with projpred’s most recent CRAN (or GitHub, shouldn’t matter) version? I don’t get such an error when running your example.

2 Likes