With one year of experience in frequentist statistics, I recently dived into the world of Bayesian regression as my dataset has complete separation and confounding issues. Most discussions on solving the separation issue suggest using Bayesian models to regularise the inflated standard error to make meaningful inferences on the model output. However, I’m unsure what the workflow is for Bayesian regression in terms of prior decision, model selection method, and data transformation.
To my understanding, different priors will influence the posterior estimation, so the prior selection is crucial, especially if my dataset has complete separation issues. As for model selection, there are mainly two ways: one is building from a simple model, and comparing loo() of having and not having each predictor to see which predictors stay in the model (similar to the stepping-up approach in frequentist models). The other method is Projection predictive feature selection, having a reference model and seeing which submodel performs similarly to the reference model (kind of like backward deletion in the frequentist approach, but not really). But I’m unsure what is the first thing to do here.
Let’s say my logistic regression model looks like this
y ~ x1 + x2 + x3 + x4 + (1|group)
with x1, x2, x3 being categorical predictors with multiple categories, and x1 and x3 have complete separation issues. x4 is a continuous predictor that I’m unsure whether I should do data-transformation on and group is the random effect.
If I decided to take the “build from simple” approach (i.e, start from y ~ 1), should I:
-
First, decide which predictor to stay with a weakly informative prior (e.g.
norm(0,2.5)), and then, after having the final model, test which prior works better for complete separation, or -
Testing whether
x1should stay with a weakly informative prior, if it stays, test which prior works better forx1, and then move on tox2?
and if I decided to use projection predictive feature selection, should I:
-
Test which prior works best for all the predictors in the reference model first, and then do projection predictive feature selection, or
-
Do projection predictive feature selection with weakly informative priors first, and after I have the best submodel, test which prior works better for the remaining predictors?
My next question is, where does data-transformation (e.g. whether or not to log or sqrt x4) land in this model selection workflow in terms of both “build from simple” and projection predictive feature selection approaches?
My final questions are for complete separation:
- What are the common priors for dealing with complete separation? Some people say weakly informative priors such as
norm(0,2.5)orstudent_t(7,0,2.5)could work, and some say using a horseshoe prior and other more “penalised “ priors. - Should I use a universal prior for all the predictors, or is it better to use special priors only on the predictors that have complete separation?
- How do I know I actually “solve” the complete separation issue?
Sorry for spamming questions in this essay-long post. I must admit that as a beginner in Bayesian statistics, I still lack a huge chunk of knowledge on how to deal with Bayesian regression, and often when I read posts online, I get further more questions in mind. But I’m eager to learn more, and if you feel like I haven’t done enough background research for the questions I asked, please don’t hesitate to point me to the reference.
