Currently, I am building a multilevel model to analyze a panel dataset. Since I have over 30 variables across the different levels in the data I want to do some variable selection. It is, however, not feasible to do some forward or backward selection since the standard models already take hours to run when only using five variables. Does anyone know a fast way to do some kind of variable selection in a bayesian multilevel model and to find a way to know in which level to put a variable?
Do you think all 30 are equally valid explanatory variables? Would it be feasible to put all 30 in at once?
You could always put sparsity-favoring priors on the coefficients like a Laplace or a Horseshoe then try a MAP estimate. The variables with zero coefficients you’d then exclude. I’ve seen people do this a lot in frequentist regressions with LASSO.
@arya Thanks for your help!
Do you know whether this approach is also possible for determining the variables in each level of my multilevel model? It sounds to me that this only works in the lowest level of the multilevel model. Or do you maybe know a different approach for determining which variables to use in each level?
Are you sure that all variables make sense in all levels?
I would try to make first a model with all sensible variables included, and then use projpred https://arxiv.org/abs/1810.02406. It is possible that the full model is computationally too heavy.
We don’t have ready made code for projpred for multilevel models, because we haven’t had good multilevek examples with that many variables, but it shouldn’t be hard to modify the existing code. porjpred is especially useful with small data, but if you have lot’s of observations, then it might be enough to look at the posterior of the full model. Although is there are correlating variables, the marginals are misleading. I don’t recommend MAP which @arya mentioned.
@avehtari Thanks for your suggestion!
Why is don’t you recommend using the MAP? Because this approach looks like to be much easier to implement for a multilevel model. However, putting Laplace priors on the \beta parameters is only possible in the lowest layer of the model.
Maximum of the joint posterior is not a good solution in multilevel models. You need to integrate over the random effects or you will severely over-fit. Even non-Bayesians integrate over the random effect space (see, e.g. glmer function - RDocumentation)
Yes, this is a good point. I didn’t realize you were doing a multilevel model.
Do you have any suggestions about how one would go about doing this? I tried to take a look at the code but it is a bit obscure.
One of my students is working on this. If you are in a hurry, send email.