Random effect adding offset to intercept in zero-centered model

I’m hoping someone can help me with something I can’t puzzle out, which is challenging my understanding of random effects terms.

Consider a centered GLMM-style model (i.e. all predictor variables are zero-mean, and factors are sum-coded). I would have expected a random effect on the intercept to add variance to the intercept, but not to change the fixed-effect estimate. This would be consistent with the model’s assumptions that random effects offsets (in the linear predictor) arise from a zero-mean Normal distribution. Why instead does adding a random effect on the intercept change the model’s fixed effect intercept estimate?

Here’s an example using the epilepsy dataset:

epilepsy$Trt <- C(epilepsy$Trt, contr.sum) # set treatment to sum coding

# no random effect on intercept:
bprior1 <- prior(student_t(5,0,10), class = b)
fit1 <- brm(count ~ zAge + zBase * Trt,
+             data = epilepsy, family = poisson(), prior = bprior1)

# model with random effect on intercept:
bprior1 <- prior(student_t(5,0,10), class = b) +
  prior(cauchy(0,2), class = sd)
fit2 <- brm(count ~ zAge + zBase * Trt + (1|patient),
            data = epilepsy, family = poisson(), prior = bprior1)


> summary(fit1)
 Family: poisson 
  Links: mu = log 
Formula: count ~ zAge + zBase * Trt 
   Data: epilepsy (Number of observations: 236) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Population-Level Effects: 
           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept      1.84      0.03     1.79     1.89       3588 1.00
zAge           0.15      0.03     0.10     0.20       3664 1.00
zBase          0.59      0.01     0.57     0.62       2993 1.00
Trt1           0.10      0.03     0.04     0.15       3500 1.00
zBase:Trt1    -0.02      0.02    -0.05     0.00       3009 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

> summary(fit2)
 Family: poisson 
  Links: mu = log 
Formula: count ~ zAge + zBase * Trt + (1 | patient) 
   Data: epilepsy (Number of observations: 236) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Group-Level Effects: 
~patient (Number of levels: 59) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     0.59      0.07     0.46     0.75        810 1.01

Population-Level Effects: 
           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept      1.64      0.08     1.47     1.80       1227 1.00
zAge           0.09      0.09    -0.08     0.28       1161 1.01
zBase          0.73      0.08     0.56     0.90       1474 1.00
Trt1           0.13      0.09    -0.04     0.30       1155 1.01
zBase:Trt1    -0.03      0.08    -0.21     0.13       1575 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

So the Intercept term estimate has changed from ~1.84 to ~1.64 after inclusion of the group-level term (1 | patient). The other fixed-effects estimates change as well, of course, since the intercept (grand mean in linear predictor) is now different.

Why does this happen? My hypothesis up to now is that maybe due to correlations between fixed-effects terms, the intercept RE can’t be included without offsetting everything else. e.g. the design is not balanced.

[Apologies that this is more a general mixed-effects modelling question rather than a brms-specific question].

Thanks

Tom

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lme4_1.1-21     Matrix_1.2-17   readxl_1.3.1    here_0.1        brms_2.9.0      Rcpp_1.0.1      forcats_0.4.0   stringr_1.4.0  
 [9] dplyr_0.8.3     purrr_0.3.2     readr_1.3.1     tidyr_0.8.3     tibble_2.1.3    ggplot2_3.2.0   tidyverse_1.2.1

Your patient data is probably unbalanced. That is more observations for some patients then for others. As a result the mean of patient means is no longer the overall mean.

Hi Paul, thanks for your reply. That might be it – except it seems to be the factor Trt that is unbalanced. Each patient has 4 observations, but there are 28 patients in Trt == 0 and 31 in Trt == 1.

Hi Tom, there are ‘soft’ zero-mean constraints from sampling MVN(0, Sigma) (non-centered).
Otherwise one one have to use a variance constant zero-mean constraint, such as QR based zero-mean, see the forum.
Sampling with regularization, eg for example using the sigma in normal-sampling statement ‘solves’ the identifiable issue. (Not fully happy with that)
My 2 cents, centered draw should be ok, but non-centered I prefer to zero-mean hard constraints by QR.