Feature selection via stan

I’m not sure whether my topic fits into category “modeling”.
Coming from a frequentist background, I want to shift to Bayesian data analysis

I have like 20 predictors (dichotomous, categorial, numeric) and one binary outcome variable (problem existent/ problem not existent).

In this matter, I am looking for an adequate Bayesian framework to do feature selection before fitting final model with most relevant predictors. Am I even on right track thinking this way?

Is such an approach implemented in rstan? Can you share some references one should read when it comes to Bayesian feature selection?

Best wishes and thanks in advance
Jens

This is one of those topics that has many answers, including “don’t do that”. The procedure you are suggesting where you drop the less relevant predictors and refit the model on the same data set is likely to exaggerate the effect of the predictors you keep. If you’re just doing prediction and don’t care about why it works that might be fine and comparing predictive error would be the way to go.

Or you could try some of the LASSO/Horseshoue priors for your effects, Aki has a nice paper (http://ceur-ws.org/Vol-1218/bmaw2014_paper_8.pdf) about that. I’m not sure if rstanarm has that kind of model yet (?) but there are a few implementations floating around the users list.

1 Like

Thanks for reply and for paper.

Pretty general question, but: Should I be worried about overfitting in context of Bayes analysis due to adding too many predictors to model?

My general advice is that you should only add predictors you have reason to believe will matter. If you’re adding a lot of predictors you’re not sure about you’ll need some outside data to confirm what you find anyway so just enjoy the fishing expedition for what it is. Most studies I can think of in that context are things like GWAS where after you get your genes somebody is going to pick a bunch of them to investigate further so what you really need is a ranking not a cutoff.

1 Like

Hi,

Should I be worried about overfitting in context of Bayes analysis due to adding too many predictors to model?

You can get overfitting if your model is badly misspecified, e.g. using thin tailed observation model in case of thick tailed data distribution or using a bad prior for predictor weights, but if you use good models and priors then there is no such thing as too many predictors (although there are computational limits).

See the following paper (and references therein) how to set a prior in case you have much more predictors than observations:

Juho Piironen and Aki Vehtari (2017). On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:905-913. On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior

The paper has Stan code, and rstanarm and brms packages have support for easily defining these priors.

In this matter, I am looking for an adequate Bayesian framework to
do feature selection before fitting final model with most relevant
predictors.

See the following paper, which illustrates what happens if you “do feature selection before fitting final model with most relevant predictors.” The paper also describes the projection predictive approach which uses decision theory to do the correct thing and is able to do the selection and important part of the information in the full model.

Juho Piironen and Aki Vehtari (2017). Comparison of Bayesian
predictive methods for model selection. Statistics and Computing,
27(3):711-735. doi:10.1007/s11222-016-9649-y. Comparison of Bayesian predictive methods for model selection | Statistics and Computing
The code is available at GitHub - stan-dev/projpred: Projection predictive variable selection

Aki

1 Like

Thanks a lot for this input! :-)

I also would like to annotate that usage of brms package is really helpful!