Advice on using search_terms in projpred

I am seeking advice on how to create an appropriate search_terms argument in projpred in order to ensure that when certain predictors are included certain other predictors are also included. I have not been able to find any formal documentation of exactly how search_terms works or its syntax, hence my question.

I have a Bernoulli family model of a binary outcome Y and there are continuous predictors A1, A2, B1, B2, C1, C2, and a binary factor F with two levels whose non-reference level is denoted F1. The model formula is Y ~ (A1 + A2 + B1 + B2 + C1 + C2) * F.
I want to apply the following constrants:

  1. A1 and A2 always come as a pair, as do B1 and B2, as do C1 and C2
  2. If A1 + A2 is included (A1 + A2):F1 must also be included; same with B1 + B2 and (B1 + B2):F1; same with C1 + C2 and (C1 + C2):F1

Is this the type of thing that is possible using search_terms? If so, how would these constraints be coded?

Many thanks in advance for any clues!

Unfortunately, I don’t know much about `projpred, but tagging @jpiironen who hopefully knows more and has time to answer.

Hi, I’m no longer actively participating in projpred development. I’m tagging @AlejandroCatalina, hopefully he can answer this one.



This is possible to do with search_terms, although a bit cumbersome to write, basically you have to think of search_terms as formula building elements, so that each member indicates a valid submodel. You can only explore submodels that are grown from at least one other member. Let me have an example:

Let’s assume, for the sake of the example, that we only have A1 + A2 + B1 from your problem, and we want to keep A1 and A2 always together. The search_terms for this would be

search_tems <- c("1", "A1 + A2", "A1 + A2 + B1")

so that B1 can only be included after including A1 + A2. In your example you are reusing terms a lot, so I would advise you to build search_terms using paste and stored variables that are t1 <- "A1 + A2", t2 <- "B1 + B2", etc. I see you have some interaction requirements as well, you can have your minimal building term be (A1 + A2):F1 in your case as you want all of this included if A1 + A2 is included.

Does this help at all?

Thanks for the question, this is an interesting one!

1 Like

Thanks, @AlejandroCatalina – the growing of terms makes sense. But, for the interaction, would having just (A1 + A2):F1 in search_terms allow for the model to also have the lower order effects of A1 and A2 as well? You imply it would but I am struggling to understand the logic of the syntax used here because : usually refers only to the interaction and not the lower-order effects as well. It’s almost like (A1 + A2):F1 in search_terms means (A1 + A2)*F1 or A1 + A2 + (A1 + A2):F1, but maybe I am completely misunderstanding the syntax here.

1 Like

You’re absolutely right, I made a mistake. You can indeed introduce A1+A2+(A1+A2):F as the building block.


Great – all makes sense now. Thank you.