Hi,
I have tried to fit a generalized linear mixed model for predicting the probability of correctly identifying objects in images. This probability should depend on some fixed and random effects. Everything works fine until I include random effects on the displayed image. Once this random effect is included, the model predicts probabilities that are higher than the empirical means for the specific conditions. This is also seen when all other fixed and random effects are removed and only the influence of the image is examined:
Family: bernoulli
Links: mu = logit
Formula: correct_concept_detection ~ 1 + (1 | original_img_name)
Data: final_data (Number of observations: 7200)
Draws: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
total post-warmup draws = 8000
Group-Level Effects:
~original_img_name (Number of levels: 40)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.76 0.21 1.39 2.22 1.00 427 801
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.48 0.27 -0.06 0.99 1.01 274 438
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The problem is that the intercept, which should represent the grand mean of the data, is too high. While 0.48 on the logit scale corresponds to a probability of 0.61, the empirical mean is 0.56. Furthermore, the differences in the random effects add up to -0.8 rather than zero. Another strange effect is that when I calculate predictions for each image (by adding the random effects to the intercept), those predictions are correct and correspond to the empirical means for each of the images. If I simplify the model even more and remove the random effect of image, the intercept is fitted correctly (about 0.24).
My question would therefore be why it is possible that the intercept term does not represent the grand mean of the data. Did I miss something obvious?
Hi @lcreining, if Iām understanding you correctly, I believe that the issue youāre seeing here is because youāre implicitly making your interpretation conditional on the random effects (specifically, on them being set at zero). Because of the nonlinear transformation of the logit link function, the predicted probability for your subjects on average (the āgrand meanā) is not equivalent to the predicted probability for a hypothetical average subject (the probability indicated by the intercept).
Hi,
thanks for the fast answer! Iāll look into this package. Just to clarify: This means, that for nonlinear models I canāt just interpret the intercept as āgrand meanā, but rather have to marginalize over the random effects to get an estimate close to the empirical mean?
Well I donāt think itās true that the intercept in linear models in general has any interpretation as a āgrand meanā (what about, say, dummy coding?), but yes, marginal and conditional statements with respect to the random effects have a different meaning. Thereās a lot of useful discourse around some of the different possible predictions that can be relevant for multilevel models (try here: A guide to correctly calculating posterior predictions and average marginal effects with multilievel Bayesian models | Andrew Heiss).
For the logistic model, if the random effects were normally distributed on the linear predictor scale (centered around the intercept), then necessarily they will be asymmetrically distributed on the manifest scale (clarification from @jsocolar; this is true as long as the intercept is not zero,).
The intercept itself is on the logit scale in this model. And the intercept is a āgrand meanā of the random effects distribution on the logit scale. But because the logit transform is nonlinear, the mean of the logits is not the same thing as the logit of the mean.
So in this intercept-only model, there is an interpretation of the intercept as a grand mean, but it doesnāt relate to the mean of the data in the way youāre expecting. Hereās a direct illustration in R
# true mean
mu <- 2
a <- rnorm(1000, mu, 2)
# sample mean
sample_mean <- mean(a)
boot::inv.logit(mu)
boot::inv.logit(sample_mean)
mean(boot::inv.logit(a))
# the integral that we are approximating with the above simulation
integrate(function(x){boot::inv.logit(x) * dnorm(x, mu, 2)}, -Inf, Inf)
# the equivalent integral on the logit scale, which indeed evaluates to mu
integrate(function(x){x * dnorm(x, mu, 2)}, -Inf, Inf)
Provided that the intercept isnāt zero. This is a very minor nit, but worth raising because somebody who wants to check this numerically might well end up checking the base case where the intercept is zero and then getting confused.