Adding random effects shifts intercept

I have tried to fit a generalized linear mixed model for predicting the probability of correctly identifying objects in images. This probability should depend on some fixed and random effects. Everything works fine until I include random effects on the displayed image. Once this random effect is included, the model predicts probabilities that are higher than the empirical means for the specific conditions. This is also seen when all other fixed and random effects are removed and only the influence of the image is examined:

final_data  <- all_data %>% 
  select('original_img_name') %>% 
  mutate(original_img_name = factor(original_img_name))
model_formula <- brmsformula(correct_concept_detection ~ 1 + (1 | original_img_name),
                               family = bernoulli(link = 'logit'))
posterior <- brm(model_formula, data = final_data)

This code produces the following posterior:

 Family: bernoulli 
  Links: mu = logit 
Formula: correct_concept_detection ~ 1 + (1 | original_img_name) 
   Data: final_data (Number of observations: 7200) 
  Draws: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
         total post-warmup draws = 8000

Group-Level Effects: 
~original_img_name (Number of levels: 40) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.76      0.21     1.39     2.22 1.00      427      801

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.48      0.27    -0.06     0.99 1.01      274      438

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

The problem is that the intercept, which should represent the grand mean of the data, is too high. While 0.48 on the logit scale corresponds to a probability of 0.61, the empirical mean is 0.56. Furthermore, the differences in the random effects add up to -0.8 rather than zero. Another strange effect is that when I calculate predictions for each image (by adding the random effects to the intercept), those predictions are correct and correspond to the empirical means for each of the images. If I simplify the model even more and remove the random effect of image, the intercept is fitted correctly (about 0.24).
My question would therefore be why it is possible that the intercept term does not represent the grand mean of the data. Did I miss something obvious?

  • brms Version: 2.19.0

Hi @lcreining, if I’m understanding you correctly, I believe that the issue you’re seeing here is because you’re implicitly making your interpretation conditional on the random effects (specifically, on them being set at zero). Because of the nonlinear transformation of the logit link function, the predicted probability for your subjects on average (the ‘grand mean’) is not equivalent to the predicted probability for a hypothetical average subject (the probability indicated by the intercept).

There’s good support for this situation via the ‘brmsmargins’ package (brmsmargins - Bayesian marginal effects • brmsmargins).

1 Like

thanks for the fast answer! I’ll look into this package. Just to clarify: This means, that for nonlinear models I can’t just interpret the intercept as “grand mean”, but rather have to marginalize over the random effects to get an estimate close to the empirical mean?

1 Like

Well I don’t think it’s true that the intercept in linear models in general has any interpretation as a ‘grand mean’ (what about, say, dummy coding?), but yes, marginal and conditional statements with respect to the random effects have a different meaning. There’s a lot of useful discourse around some of the different possible predictions that can be relevant for multilevel models (try here: A guide to correctly calculating posterior predictions and average marginal effects with multilievel Bayesian models | Andrew Heiss).

For the logistic model, if the random effects were normally distributed on the linear predictor scale (centered around the intercept), then necessarily they will be asymmetrically distributed on the manifest scale (clarification from @jsocolar; this is true as long as the intercept is not zero,).


The intercept itself is on the logit scale in this model. And the intercept is a “grand mean” of the random effects distribution on the logit scale. But because the logit transform is nonlinear, the mean of the logits is not the same thing as the logit of the mean.

So in this intercept-only model, there is an interpretation of the intercept as a grand mean, but it doesn’t relate to the mean of the data in the way you’re expecting. Here’s a direct illustration in R

# true mean
mu <- 2

a <- rnorm(1000, mu, 2)
# sample mean 
sample_mean <- mean(a)



# the integral that we are approximating with the above simulation
integrate(function(x){boot::inv.logit(x) * dnorm(x, mu, 2)}, -Inf, Inf)

# the equivalent integral on the logit scale, which indeed evaluates to mu
integrate(function(x){x * dnorm(x, mu, 2)}, -Inf, Inf)

Provided that the intercept isn’t zero. This is a very minor nit, but worth raising because somebody who wants to check this numerically might well end up checking the base case where the intercept is zero and then getting confused.

1 Like

Yes, that’s an important clarification!