Variable selection in bayesian multilevel model

c5v · February 5, 2019, 3:47pm

Currently, I am building a multilevel model to analyze a panel dataset. Since I have over 30 variables across the different levels in the data I want to do some variable selection. It is, however, not feasible to do some forward or backward selection since the standard models already take hours to run when only using five variables. Does anyone know a fast way to do some kind of variable selection in a bayesian multilevel model and to find a way to know in which level to put a variable?

arya · February 6, 2019, 3:38am

Do you think all 30 are equally valid explanatory variables? Would it be feasible to put all 30 in at once?

You could always put sparsity-favoring priors on the coefficients like a Laplace or a Horseshoe then try a MAP estimate. The variables with zero coefficients you’d then exclude. I’ve seen people do this a lot in frequentist regressions with LASSO.

c5v · February 6, 2019, 8:20am

@arya Thanks for your help!

Do you know whether this approach is also possible for determining the variables in each level of my multilevel model? It sounds to me that this only works in the lowest level of the multilevel model. Or do you maybe know a different approach for determining which variables to use in each level?

avehtari · February 6, 2019, 8:23am

Are you sure that all variables make sense in all levels?
I would try to make first a model with all sensible variables included, and then use projpred https://arxiv.org/abs/1810.02406. It is possible that the full model is computationally too heavy.
We don’t have ready made code for projpred for multilevel models, because we haven’t had good multilevek examples with that many variables, but it shouldn’t be hard to modify the existing code. porjpred is especially useful with small data, but if you have lot’s of observations, then it might be enough to look at the posterior of the full model. Although is there are correlating variables, the marginals are misleading. I don’t recommend MAP which @arya mentioned.

c5v · February 6, 2019, 8:58am

@avehtari Thanks for your suggestion!

Why is don’t you recommend using the MAP? Because this approach looks like to be much easier to implement for a multilevel model. However, putting Laplace priors on the \beta parameters is only possible in the lowest layer of the model.

avehtari · February 6, 2019, 9:15am

Maximum of the joint posterior is not a good solution in multilevel models. You need to integrate over the random effects or you will severely over-fit. Even non-Bayesians integrate over the random effect space (see, e.g. glmer function - RDocumentation)

arya · February 6, 2019, 7:49pm

Yes, this is a good point. I didn’t realize you were doing a multilevel model.

ernest · February 8, 2019, 11:11am

It is not Bayesian and may sound like an amateur, but what about variance inflation factors (e.g., Assaf et al. 2019, Yu et al. 2015)?

Matias_Guzman_Naranjo · February 9, 2019, 12:44pm

Do you have any suggestions about how one would go about doing this? I tried to take a look at the code but it is a bit obscure.

avehtari · February 11, 2019, 2:42pm

One of my students is working on this. If you are in a hurry, send email.

Topic		Replies	Views
Variable selection for an exploratory multilevel categorical model with weak priors Modeling techniques , loo	2	575	January 14, 2022
Paper: Causal inference with panel data by Pang, Liu, and Xu Modeling techniques	15	2057	July 18, 2023
Projpred: Fixing Group Effects in Search Terms and Tips for Speed? General specification , hierarchical-model , projpred , model-selection , brms	5	749	September 6, 2023
Using Horseshoe prior in hierarchical model for variable selection Modeling hierarchical-model , horseshoe-prior	13	2947	November 17, 2023
Variable selection with ordinal model Modeling projpred	61	4705	February 25, 2023

Variable selection in bayesian multilevel model

Related topics