Variable selection with CAR (conditional autoregressive) component using the projpred package

Theresa_U · September 27, 2022, 1:17pm

Thank you for developing the projpred package. I am trying to apply the variable selection in the projpred R package to a Poisson model that I fit in brms. The model includes several ordinary covariates (and possible interactions) and a CAR (conditional autoregressive) component for the spatial dependencies of US counties. When I tried to perform the variable selection in projpred to the brms fit, I got an error message that this is not yet implemented in brms. Do you know if it is implemented in Stan or how I should handle the CAR component? I saw that you wrote an article “‘Projection predictive model selection for Gaussian processes” and if I am correct CAR models are a form of Gaussian processes. I want to do variable selection only for the ordinary covariates but I think I should include the CAR component to adjust for spatial confounding.

Best,
Theresa

avehtari · September 27, 2022, 4:19pm

projpred needs to know some things about the models. In the cases of rstanarm and brms, for a set of models including normal/generalized/additive/hierarchical linear models, projpred knows enough about the models. In theory, car model should be similar to hierarchical models supported, but even then adding the support for car requires some coding, and we have limited resources. It is also possible that implementing the projection for car may require taking into account something special about car, so it would require a bit of thinking and experimenting, too. If you are interested just in the variable selection, and don’t need to project the car part, it would be possible for you to implement the get_refmodel and init_refmodel functions yourself, but as this is not the simplest case, I do understand if this is beyond your current skill set. I’m pinging @fweber144 and @AlejandroCatalina if they have something to add.

As the projpred support for car models would not be quickly available, I’m also checking whether you really need projpred or if some other approach would be good. How many covariates do you have? How many interactions? How many observations? What is the purpose of the variable selection?

Theresa_U · September 27, 2022, 4:56pm

Thank you for the clarification. I will have a look at the get_refmodel and init_refmodel functions.

I have about 60 variables among which some are highly correlated. Additionally, several variables form groups (e.g. pesticide group consisting of three single pesticides). I am not sure how to best deal with the interactions. There are multiple plausible interactions. Is it a good approach to include all of them in the first step or what is the general suggested approach for interactions? I have 159 observations (counties) and many of the counts in my Poisson model are zero so I was trying to fit a zero inflated Poisson model. The purpose of the variable selection is to identify all relevant variables and interactions for the outcome. I read in your article that this was not the primary purpose of projpred (as opposed to variable selection for finding a minimal set of variables) but that it would still work good in this case.

avehtari · September 27, 2022, 6:24pm

With 60 variables and interactions and only 159 observations this is a challenging task. Did you use something like horseshoe or R2D2 prior for the coefficients? For the interactions, it would be good to use prior that would allow big interaction only if the main effects are big. projpred performs also better when using good priors, so it’s good to start from there.

fweber144 · September 27, 2022, 7:01pm

I think @avehtari has mentioned all major points, so I don’t have much to add.

I can help with init_refmodel() if necessary. In that case, a reproducible example would be good.

For the projection of CAR components, you are welcome to create a feature request issue on projpred’s issue tracker. However, as mentioned by @avehtari, this is not likely to be implemented in projpred soon.

In my understanding, using projpred for complete variable selection (as opposed to minimal subset variable selection) is not trivial. Pavone et al. (2022) have conducted experiments for this. Which article did you refer to?

References

Pavone, F., Piironen, J., Bürkner, P.-C., & Vehtari, A. (2022). Using reference models in variable selection. Computational Statistics. DOI: 10.1007/s00180-022-01231-6

Theresa_U · September 28, 2022, 1:27am

I have not implemented any variable selection yet because I wanted to decide on the theoretical approach first. I will definitely try out the priors suggested. Is there a method to take into account if multiple variables belong to a bigger groups (e.g. 3 pesticides all belong to a pesticide group) in the variable selection?

Theresa_U · September 28, 2022, 1:44am

Thank you for offering your help with the init_refmodel(). I will try to get started and come back to your offer.

I refered to the article “Projective inference in high-dimensional problems: Prediction and feature selection” [Projective inference in high-dimensional problems: Prediction and feature selection] by Juho Piironen, Markus Paasiniemi and Aki Vehtari, where they authors say “The empirical evidence indicates that the reference model approach could be highly useful also in this problem setting since it tends to help rank the truly relevant features before the irrelevant ones”. I will look more closely to the experiments you mentioned. Would you still recommend projpred in this case or what other approach would you suggest?

fweber144 · September 28, 2022, 5:40am

Arguments search_terms or penalty of varsel() and cv_varsel() might be helpful for that.

Ah ok. The subtle point, however, is that they refer to the more general reference model approach, not projpred specifically. The reference model approach is one of several important aspects of projpred and was later investigated in a more general framework by Pavone et al. (2022), also with respect to complete variable selection. So I guess Pavone et al. (2022) tackled the omitted part of the citation: “but the topic requires more research.”. I don’t want to say that complete variable selection is impossible with projpred, but as can be seen from the iterative projpred procedure in Pavone et al. (2022, section “4.1 Iterative projections”), it requires a quite sophisticated approach.

Theresa_U · September 28, 2022, 7:58pm

Thank you for pointing that out. So it would be probably a good approach to compare the results from the reference model approach with projpred in my setting also to other methods for complete variable selection like the local false discovery rate?

fweber144 · October 2, 2022, 4:18am

You mean that you want to use methods other than projpred (that are made for complete variable selection) as a “gold standard” to see whether the projpred results can be interpreted as complete variable selection results? But in that case, why would you need (the non-iterative) projpred at all? Apart from that, if you have correlated predictors, projpred will likely give a different (sparser) solution, so in that case, I’m not sure whether a comparison makes sense in the first place.

Also, depending on your model, implementing the local FDR approach might not be that easy.

Perhaps @avehtari has more thoughts on this?

Topic		Replies	Views
Inquiries about variable selection using projpred Modeling techniques , loo	12	1176	October 30, 2019
Accounting for measurement error during variable selection with projpred (possibly with rstanarm or brms?) Modeling	25	1132	July 31, 2025
Variable selection with ordinal model Modeling projpred	61	4844	February 25, 2023
Projpred with brms object General	11	1262	February 1, 2022
Projpred::cv_varsel() returning "Not enough (non-NA) data to do anything meaningful" error brms	6	1414	November 6, 2021

Variable selection with CAR (conditional autoregressive) component using the projpred package

References

Related topics