I am using brms to model a real outcome, where the goal is to establish a model where coefficients can be used for prediction.
A number (i.e. 50+ ) of the columns in my dataset are indicator variables coded as “onehot” (i.e. 1 = presence, 0 = absence) for various potentially useful predictors.
I expect there to be correlation (and possibly 1-way or 2-way causation) for some of these predictors. For example, variables exposure1-exposure50 can each take values of 1 or 0, and subsets of these 50 may at some level be causally related.
Section 12.7 in Regression and Other Stories (RAOS) “Models for regression coefficients” has been informative, as well as McElreath’s cautious admonitions about avoiding “Causal Salad”, and I seek further depth in resources and examples that discuss strategies of how to include:
- many onehot predictors where correlation structures may be present
- many onehot predictors in conjunction with non-onehot predictors
- guidance with particular emphasis on inclusion of many such onehot variables, ideally using the
Thanks in advance.