Bayesian model selection: start complex and drop terms, or start simple and add terms?

laura.vg · April 19, 2021, 6:43am

I’m confused about different approaches people use for Bayesian model selection.

I understand the frequentist approach is generally to fit the most complex model first with all predictors and their hypothesized interactions, then run subsequent models removing one term at a time, and use something like a likelihood ratio tests to compare models and select the simplest model that does not significantly reduce model fit.

I’ve seen numerous papers using Bayesian models do this in the opposite way, where each predictor is fitted first on it’s own, then only significant predictors (credible intervals that don’t include 0) are fitted in the next round.

I’m struggling to find information about what the benefit of starting simple with individual predictors is, other than it being easier to get models to converge. Wouldn’t you risk throwing away important predictors that might be non-significant on their own, but be significant when included as an interaction with another variable? Am I missing something, or if I’m able to get my most complex model to converge, would it be preferable to use the method of starting complex and dropping terms?

fusaroli · April 19, 2021, 12:56pm

Without knowing the field of application and the specifics of the problem, a general procedure would be:

Model fitting would start from the simplest model (to make sure your model is ok and you understand the data, etc.); and then build up to the complete model, with all predictors, no feature selection, unless motivated by model issues (aka not by “significance”), e.g. extreme correlations between features
Once you are satisfied with step 1, you can move onto hypothesis testing (which features are relevant, given all the others), which would happen on the full model. You can also look into projpred by @avehtari for how to identify models with a smaller set of features (always starting from the full model).

avehtari · April 19, 2021, 3:28pm

Bayesian workflow paper and my talk related to the paper discuss the benefits of starting from simple. The covariate (and some other model structure) selection cases are special as then there is a combinatorial explosion of the number of models, but it’s easy to add include covariates and sensible prior and appriopriate use of of decision theory helps going from the biggest model to a submodel with similar performance as discussed in a video and paper 1, paper 2, and paper 3.

So it depends on the case which direction easier.

Topic		Replies	Views
Feature selection via stan Modeling fitting-issues	5	2582	May 15, 2017
Bayes factor favours reduced model without the predictor of interest: how do to explain? Modeling hierarchical-model , bayes-factor , brms	1	756	March 18, 2022
First Steps in Reporting Bayes Results - Stuck in Frequentist Thinking General	2	768	January 25, 2022
Hierarchical Linear Models - Bayes vs. Frequentist General	7	6725	December 14, 2019
Brms - Variable/Feature Selection with ordinal response Modeling model-selection	2	827	June 7, 2021

Bayesian model selection: start complex and drop terms, or start simple and add terms?

Related topics