Calcualte VIF?

Hi! I am quite new to Bayesian statistics and I am having quite a hard time for understanding how to deal with collinearity between parameters.

Is there a way I can calculate collinearity between my regressors as you would with VIF or an equivalence test?

I have a model with random effects and multiple categorical regressors with two levels each (0/1):

Outcome ~ Categorical1 + Categorical2 + Categorical3 + (1|person)

Is there a way I can get around this in a Bayesian setup?
And any advice on how to solve multicollinearity?

Very much appreciated your help in advance.

3 Likes

I’m gonna go ahead and tag @avehtari and @andrewgelman on this, as they can provide much more complete answers than I could.

1 Like

What is VIF?

I guess it’s the Variance inflation factor.

1 Like

Yes, VIF as Variance Inflation Factor.

You should be able to just fit this model directly in rstanarm or brms.

Sorry, I might not have expressed myself properly…
I have run such models in brms, and for some of them I have up to 100-200 binary \beta_{categorical} parameters.

I have done the test for practical equivalence using the package bayestestR to check whether my parameter values should be accepted or rejected against the null hypothesis (whether or not the HDI region of the posterior distribution of my parameters falls within a ROPE region).

I got warnings about possible multicollinearity between some of my parameters. And the problem is that then I should not trust the results because multicollinearity may shift the distributions towards or away from the ROPE.

How can I estimate properly if there is inflation in my models due to these correlations? Which threshold between pairwise correlations should I consider as intolerable (i.e. > 0.9)?

Should I reconsider my model design? And perform a univariate analysis instead?

For correlated predictors I recommend projpred. We have a pull request which brings support for categorical variables and “random” effects, so if you are in a hurry and brave you can test it right now or wait for a moment and look for the announcement when it’s merged. projpred works very well with correlated predictors, although it answers slightly different question “What is the minimal set of predictors providing the same predictive performance as the full model?”. You can find case studies and videos of multicollinearity and projepred at https://avehtari.github.io/modelselection/. If you need to find all predictors with some predictive information, then univariate approaches seem to be good choice, and we’ll soon have a paper out with more recommendations.

2 Likes

Thank you @avehtari. The link to your resources are really useful, comprehensive and interesting for learning more in this line.
However, as you say projpred seems to answer slightly different my question and fits very well from a prediction point of view. Instead I am more interested in evaluating the effect sizes that all the regressors have in my outcome of interest.
I guess that a univariate approach will do, and/or a multivariate approach assuming some degree of inflation in variables showing correlations.

1 Like