Still confused by search_terms in projpred

andymilne · July 6, 2021, 6:23am

Hi @AlejandroCatalina, I have a brms model (Bernoulli family), which is a simpler version of what I was describing in Advice on using search_terms in projpred. The formula is:

Y ~ (X1 + X2 + X3 + X4 ) * F1 * F2

X1 to X4 are four different continuous predictors, F1 and F2 are factors with 2 and 5 levels respectively.

I want to use projpred to help determine which of the X should be included in the model because they improve predictions in any of the 2 x 5 = 10 conditions of the experiment. This means that whenever any X is included, I also want to include all of its interactions with F1 and F2. In an attempt to do this in propred, I have specified :

search_terms = c(
"1", 
"F1 + F2 + F1:F2 + X1 + X1:F1 + X1:F2 + X1:F1:F2",
"F1 + F2 + F1:F2 + X2 + X2:F1 + X2:F2 + X2:F1:F2",
"F1 + F2 + F1:F2 + X3 + X3:F1 + X3:F2 + X3:F1:F2",
"F1 + F2 + F1:F2 + X4 + X4:F1 + X4:F2 + X4:F1:F2")

However, when I run vs <- varsel(mdl, search_terms = search_terms), vs$solution_terms shows 19 entries each of which is a single item (e.g., F1, F2, X1:F1) and not the four different composite entries provided search_terms. I have checked that the variable names in search_terms match the names in the model formula, so I don’t understand what’s happening here. Have I made a mistake with the syntax, or maybe I am misunderstanding the output of vs$solution_terms?

AlejandroCatalina · July 6, 2021, 9:46am

Hello! Thanks for the follow up, I’ll try this locally to identify where the issue might be. As far as I can see on the phone this should work alright but off course search_terms has not been used a lot and probably not tested all the edge cases, so it might be buggy :).

andymilne · July 6, 2021, 9:53am

The factors are sum-coded (contr.sum) in case that’s relevant.

AlejandroCatalina · July 12, 2021, 2:47pm

Can you provide a reproducible example with these conditions so I can test and debug this? Thanks and sorry for any inconvenience.

hoddoi · July 15, 2021, 12:42pm

Hi there -

I have the same problem and confused by the syntax. I have a model with say 20 variables but if I specify varsel(mod, method=“forward”, search_terms=c(“1”, “X1”, X2")) I get back a vs object with all terms included and not just the ones I specified.

I was trying to create a reproducible example using the rstanarm logistic regression example at Bayesian Logistic Regression with rstanarm. Using the post2 model on this page I attempt to limit the search as following:

varsel2 ← varsel(post2, method=‘forward’, data=diabetes, search_terms = c(“1”, “glucose”, “bloodpressure”))
However I get this error:
[1] “10% of terms selected.”
[1] “20% of terms selected.”
Error in sub[“kl”, i] : incorrect number of dimensions

Ideally I want to get to the situation where I can make sure one variable is always entered last. I think this has been done on another post but I can’t work out the correct syntax from the documentation and it would be really helpful if I could get a pointer.

Great work on this package by the way.
All the best
Jon

AlejandroCatalina · July 16, 2021, 6:30am

What might be happening in this case is that projpred expects search terms to contain all of these variables in the model formula. Your syntax is correct here. You can pass nterms=2 to let it know that only 2 terms are included. I will automatically set nterms to the number of variables passed in search terms if it’s provided. Thanks for noticing!

andymilne · July 17, 2021, 7:17am

Hi @AlejandroCatalina, I thought I would recreate this with a new data frame with the simpler variable names (X1, X2, … F1, F2) as above. But now I can’t even get varsel to complete, so I don’t know what’s going on. I get the error message Error in eval(predvars, data, env) : object 'F2i' not found. This is the code I used (using v.2.0.2 of projpred):

mdl_projpred <-
  brm(
    Y ~ (X1 + X2 + X3 + X4) * F1 * F2,
    data = data,
    family = bernoulli(link = "logit"),
    prior = c(set_prior("student_t(3, 0, 1)", class = "Intercept"),
              set_prior("student_t(3, 0, 1)", class = "b")),
    sample_prior = "yes",
    save_pars = save_pars(all = TRUE),
  )
mdl_projpred

library(projpred)
search_terms = c(
  "1",
  "F1 + F2 + F1:F2 + X1 + X1:F1 + X1:F2 + X1:F1:F2",
  "F1 + F2 + F1:F2 + X2 + X2:F1 + X2:F2 + X2:F1:F2",
  "F1 + F2 + F1:F2 + X3 + X3:F1 + X3:F2 + X3:F1:F2",
  "F1 + F2 + F1:F2 + X4 + X4:F1 + X4:F2 + X4:F1:F2"
)
refmodel <- get_refmodel(mdl_projpred)
vs <- varsel(
  refmodel,
  search_terms = search_terms)

I can make the data set available, if useful.

AlejandroCatalina · July 20, 2021, 7:32am

Yes, it would be useful to have the data frame available so I can run and debug the example myself. Thanks and sorry for the delay!

andymilne · July 21, 2021, 10:47am

Thanks Alejandro – no worries – I’ll email you shortly.

fweber144 · April 25, 2022, 11:05am

@andymilne, I don’t know if your problem got solved by the personal communication with @AlejandroCatalina, but if not, perhaps you’re experiencing the same issue as described here, i.e., having an L1 search? The search_terms argument only takes effect in case of a forward search.

andymilne · April 27, 2022, 12:52am

I don’t remember – I’ll check in the next week or so…

Topic		Replies	Views
Advice on using search_terms in projpred Interfaces projpred	6	845	June 28, 2021
Projpred: Behavior of argument search_terms Interfaces specification , projpred	4	812	November 9, 2020
Projpred varsel specifying search_terms Modeling	4	774	April 25, 2022
Projpred: Fixing Group Effects in Search Terms and Tips for Speed? General specification , hierarchical-model , projpred , model-selection , brms	5	775	September 6, 2023
Projection predictive feature selection for multilevel phylogenetic models Interfaces projpred	8	1294	March 25, 2021

Still confused by search_terms in projpred

Related topics