Many "onehot" indicator predictors

aaa · March 4, 2022, 7:09pm

I am using brms to model a real outcome, where the goal is to establish a model where coefficients can be used for prediction.

A number (i.e. 50+ ) of the columns in my dataset are indicator variables coded as “onehot” (i.e. 1 = presence, 0 = absence) for various potentially useful predictors.

I expect there to be correlation (and possibly 1-way or 2-way causation) for some of these predictors. For example, variables exposure1-exposure50 can each take values of 1 or 0, and subsets of these 50 may at some level be causally related.

Section 12.7 in Regression and Other Stories (RAOS) “Models for regression coefficients” has been informative, as well as McElreath’s cautious admonitions about avoiding “Causal Salad”, and I seek further depth in resources and examples that discuss strategies of how to include:

many onehot predictors where correlation structures may be present
many onehot predictors in conjunction with non-onehot predictors
guidance with particular emphasis on inclusion of many such onehot variables, ideally using the brms package.

Thanks in advance.

Ara_Winter · March 31, 2022, 10:56pm

If I am understanding your data structure and model it’s something like:
y predicted by onehot1 + onehot2 + onehot3 + onehotn …

What kind if model are you thinking of using? Like a linear regression to start? Do you have a simple model up and running on some simulated data in brms?

Topic		Replies	Views
Correlated predictors vs fewer predictors with small dataset? Modeling specification , r , brms	8	625	October 3, 2022
Correlated predictor variables in brms Modeling specification , brms	4	1407	April 11, 2022
Indicator variables very slow brms	8	843	September 8, 2021
Brms modelling 'simple' correlation in both directions Modeling	0	301	October 5, 2022
Common Cause of Two Variables in brm? (Same IV-Effect; different DV's) brms	3	511	June 5, 2019

Many "onehot" indicator predictors

Related topics