Categorical reference level changes output in binomial GLMM

I was fitting my model and was just messing around with the reference level. When doing this I noticed that when I switched reference level for my categorical variable status; it changes the coefficients of the model. It should only change the signs when I have a dichotomous categorical variable, but some of the actual values are changing.

male_degree and Intercept seem to have the biggest change.

Also when looking for Multicolinearity it also changes the VIF values when I switch between different reference levels.



What could be causing this?


This behavior would be expected. You can show yourself this with a simulation like the following:

n <- 1000
trials <- rpois(n, lambda=100)
x1 <- rnorm(n, 0, 1)
x2 <- rnorm(n, 0, 1)
trt <- c(rep(0, n/2), rep(1, n/2))
alpha <- -3
b1 <- 1
b2 <- 0.5
b3 <- -2
b4 <- 0.1
p <- plogis(alpha + b1*x1 + b2*x2 + b3*trt + b4*trt*x2)
y <- rbinom(n, size=trials, prob=p)
d1 <-, x1, x2, trt, trials)
d1$trt <- factor(d1$trt)
d1$trt.rl <- relevel(d1$trt, ref = "1")      #switch the reference level


#this model recovers the programmed parameters from the simulation
m1 <- brm(y | trials(trials) ~ 1 + x1 + x2 + trt + trt:x2, data=d1, family=binomial, cores=4)

#this model uses the switched reference level and the parameter estimates for Intercept and x2 are different (in addition to the switched signs on the categorical variable and the interaction)
m2 <- brm(y | trials(trials) ~ 1 + x1 + x2 + trt.rl + trt.rl:x2, data=d1, family=binomial, cores=4)

Your intercept is the value (on the scale of the linear predictor; here logit) when all continuous predictors are held at zero and all categorical predictors at their reference level. Thus, when you switch reference levels, you change the meaning, and thus value, of the intercept in the model. Notice in the simulation the relationship between the value for trt and Intercept and trt.rl and Intercept in each model. Intercept added to trt in m1 is the same as Intercept in m2, and vice versa. The same thing goes for your model output above with Intercept and status. In addition, when you have an interaction with the switched categorical variable and some other variable, then those values will also naturally change, as the meaning of the predictors has been changed.

One thing that might be helpful for you is to create simulations like I did above or like I did for you here Modeling heteroskedasticity using the auxiliary parameter phi for beta_binomial distribution - #4 by jd_c as these simulations can really help you understand what is going on, if you play around with changing the inputs to the simulations. I think some of your other questions on other posts were answered in the first simulation that I wrote. Doing these sort of sims has been helpful to me, when I don’t understand why things are happening:)

I can’t remember how VIF is calculated, so I don’t know about that.

Great, yeah your simulations have been really helpful.

Is there a book or tutorial that you would recommend with the simulations?

If not I will just use yours as a starting ground.