I am seeking advice on how to create an appropriate search_terms
argument in projpred
in order to ensure that when certain predictors are included certain other predictors are also included. I have not been able to find any formal documentation of exactly how search_terms
works or its syntax, hence my question.
I have a Bernoulli family model of a binary outcome Y and there are continuous predictors A1
, A2
, B1
, B2
, C1
, C2
, and a binary factor F
with two levels whose non-reference level is denoted F1
. The model formula is Y ~ (A1 + A2 + B1 + B2 + C1 + C2) * F
.
I want to apply the following constrants:
-
A1
and A2
always come as a pair, as do B1
and B2
, as do C1
and C2
- If
A1 + A2
is included (A1 + A2):F1
must also be included; same with B1 + B2
and (B1 + B2):F1
; same with C1 + C2 and (C1 + C2):F1
Is this the type of thing that is possible using search_terms
? If so, how would these constraints be coded?
Many thanks in advance for any clues!
Unfortunately, I don’t know much about `projpred, but tagging @jpiironen who hopefully knows more and has time to answer.
Hi, I’m no longer actively participating in projpred development. I’m tagging @AlejandroCatalina, hopefully he can answer this one.
2 Likes
Hi,
This is possible to do with search_terms
, although a bit cumbersome to write, basically you have to think of search_terms
as formula building elements, so that each member indicates a valid submodel. You can only explore submodels that are grown from at least one other member. Let me have an example:
Let’s assume, for the sake of the example, that we only have A1 + A2 + B1 from your problem, and we want to keep A1 and A2 always together. The search_terms for this would be
search_tems <- c("1", "A1 + A2", "A1 + A2 + B1")
so that B1
can only be included after including A1 + A2
. In your example you are reusing terms a lot, so I would advise you to build search_terms using paste
and stored variables that are t1 <- "A1 + A2", t2 <- "B1 + B2"
, etc. I see you have some interaction requirements as well, you can have your minimal building term be (A1 + A2):F1
in your case as you want all of this included if A1 + A2
is included.
Does this help at all?
Thanks for the question, this is an interesting one!
1 Like
Thanks, @AlejandroCatalina – the growing of terms makes sense. But, for the interaction, would having just (A1 + A2):F1
in search_terms
allow for the model to also have the lower order effects of A1
and A2
as well? You imply it would but I am struggling to understand the logic of the syntax used here because :
usually refers only to the interaction and not the lower-order effects as well. It’s almost like (A1 + A2):F1
in search_terms
means (A1 + A2)*F1
or A1 + A2 + (A1 + A2):F1
, but maybe I am completely misunderstanding the syntax here.
1 Like
You’re absolutely right, I made a mistake. You can indeed introduce A1+A2+(A1+A2):F as the building block.
2 Likes
Great – all makes sense now. Thank you.
2 Likes