(EDIT: This post has been severely overhauled since its initial appearance due to serious errors detected by the author. If you dismissed it as the ramblings of a village idiot on first reading, please take another look.)
Many thanks for the further clarifications! And sorry about being vague with the ML acronym: I didnāt know it could mean anything other than maximum likelihood.
Thanks for the confirmation!
It seems to me that these two sentences embed two profound yet separate lessons that Iām only beginning to understand. Hence I must belabor further. Back to the original mock data:
> ni <- 20; pF = 0.75; pM = 0.90
> (Dat <- data.frame(Sex = c("Female","Male"), y = c(ni*pF, ni*pM), ni = c(ni,ni)))
Sex y ni
1 Female 15 20
2 Male 18 20
> summary(glm(y/ni ~ Sex, weights = ni, family = binomial, data = Dat))$coefficients
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.098612 0.5163978 2.127454 0.03338242
SexMale 1.098612 0.9067647 1.211574 0.22567560
The sample-based logits of each group are identical relative to their respective baselines: ML estimation uses the modes of the log-likelihood as point estimates, reporting them as identical. Yet the non-intercept is estimated to be more uncertain, even though thereās no prior.
In this particular setting, it looks to me like the larger SE of the non-intercept is not due to the logit link or binomial likelihood, but to the fact to which you referred earlier in the context of Bayesian models, i.e. that the Male prediction derives from two coefficients rather than just one. For behold what happens when we model the same two logit values using Ordinary Least Squares with an identity link:
> Dat2 <- data.frame(Dat[c(1,1, 2,2),], row.names = NULL)
> Dat2$y.new <- with(Dat2, qlogis(y/ni)+c(-0.2, +0.2)); Dat2
Sex y ni y.new
1 Female 15 20 0.8986123
2 Female 15 20 1.2986123
3 Male 18 20 1.9972246
4 Male 18 20 2.3972246
> summary(lm(y.new ~ Sex, data = Dat2))$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.098612 0.2000000 5.493061 0.03157991
SexMale 1.098612 0.2828427 3.884181 0.06034527
The coefficients are exactly the same as in the binomial model, and the SE of the non-intercept remains larger than that of the intercept!
If we now put the Intercept back, thus making Female the reference level again, then proceed to fit the binomial model with a Bayesian N(0, \sigma) prior with a small \sigma, we see what appears to be another reflection of the same phenomenon:
> fixef(Mod <- brm(y|trials(ni) ~ 0 + Intercept + Sex, prior = prior(normal(0, .25)), family = binomial, backend = "cmdstanr", data = Dat))
Estimate Est.Error Q2.5 Q97.5
Intercept 0.4563496 0.1973907 0.0714488 0.8368387
SexMale 0.2826627 0.2218723 -0.1447267 0.7202204
The non-reference level shrinks more. So, hereās yet another way one might think about it, and Iād like to know if itās correct: the uncertainty difference occurs because the reference level has a fixed baseline of comparison at 0, whereas the non-reference level has non-fixed baseline of comparison estimated from data. Both are aiming darts at identical bullseyes, but one is standing on solid ground while throwing, the other is standing on a balanceboard.
Thereās also Jacobās second point, to the effect that with fixed n, the binomial (log) likelihood is less certain about large logits than small ones. Yes! I believe thereās a parsimonious way to demonstrate this with the same mock data, without even going Bayesian:
summary(glm(y/ni ~ 0 + Sex, weights = ni, family = binomial, data = Dat))$coefficients
Estimate Std. Error z value Pr(>|z|)
SexFemale 1.098612 0.5163978 2.127454 0.033382417
SexMale 2.197225 0.7453560 2.947886 0.003199549
Even though the Wald z value
and Pr(>|z|)
do diagnose the Male effect as āmore significantā than the Female one due to its larger magnitude, the important part is the Std. Error
, which is almost 50% larger than the Female one even though both are based on the same amount of data and have the same baseline of comparison, i.e. zero. A memorable lesson indeed!
But Iāve run into yet another dilemma! Look what happens when we make our normal
prior ultra-diffuse:
fixef(Mod2 <- update(Mod, prior = prior(normal(0, 25))))
Estimate Est.Error Q2.5 Q97.5
Intercept 1.154554 0.5279791 0.1905524 2.251391
SexMale 1.260148 0.9647825 -0.4800386 3.277488
Over and over across refits, b_SexMale
overshoots its MLE by a wider margin than b_Intercept
. Whaaat? The greater uncertainty (=larger SE) makes sense in light of everything stated above, but how can the prior be pulling the estimate away from zero rather than towards it? Even with the \sigma = 25 the prior still has its mode and mean at 0, so if anything, that is the value it should pull the estimate toward. Whatās going on?