Correlated predictor variables in brms

Hi everyone,

I am currently looking into correlated predictor variables in a Bayesian mixed model using brms. I have a couple of questions im hoping someone might help me with.

I have 4 predictor variables (lets call them Pop, For, Acc1 and Acc2), where all are quite different in terms of meaning. The Acc variables are similar types of data, but are derived differently, although mean different things to some extent. The variables above are highly correlated (~ 0.8) in the following way:

Pop is negatively correlated with For
Acc1 is positively correlated with Acc2

My questions are:

  1. can correlated predictors be included in the same Bayesian model?
  2. if not, what’s the best way to choose between them?

Any advice much appreciated!

Howdy. This blog post and comments might help you Collinearity in Bayesian models | Statistical Modeling, Causal Inference, and Social Science (columbia.edu)

Thanks for the response! I’ve read through, but i’m not really sure they came up with a clear decision / solution in the end.

If you have McElreath’s Statistical Rethinking book, he has a good example of when it can be a problem (in terms of interpreting coefficients). That example is worked out here in @Solomon translation of Statistical Rethinking into brms 6 The Haunted DAG & The Causal Terror | Statistical rethinking with brms, ggplot2, and the tidyverse: Second edition
You can include the correlated predictors in the same model, but if the collinearity is super strong, then, as Gelman writes in the blog post that I linked first, you might “can learn about the linear combination but not as much about the separate parameters”. This is nicely seen in the leg length example in Statistical Rethinking. You can see the signs of this in the standard errors of the coefficients, which will be quite large when this happens.
If this is a problem, and you want to choose between variables then you could use projection predictive variable selection as @avehtari mentioned in one of the comments to Gelman’s blog post. An example is here Bayesian version of Does model averaging make sense?

2 Likes

Simple answer is yes. But as you can guess from @jd_c’s response, you would need to tell more about the phenomenon you are modeling and what is the goal of the modeling, that it would be possible to tell how it can affect what you can learn. If you are interested just in predictions, usually no problems. If you want make inference on parameter values with some causal interpretation, then there can be problems (but you asked only whether you can include them in the same model, and for that the answer is yes).

4 Likes