Hi,
I am working on a categorical regression with a single predictor and a single group-level effect. The group-level effects however are not present in each response category (I measured multiple individuals per group). But the model still estimates a value for each grouping level within the response category.
Here is a simple example:
X = rnorm(18, 0, 1)
Y = rep(letters[1:3], each = 6)
G = rep(1:6, each = 3)
df <- dplyr::bind_cols(X = X, Y = Y, group = G)
X Y group
<dbl> <chr> <int>
1 -0.410 a 1
2 0.555 a 1
3 -1.71 a 1
4 -0.619 a 2
5 -1.55 a 2
6 -0.462 a 2
7 0.482 b 3
8 -0.972 b 3
9 1.10 b 3
10 -0.620 b 4
In this dataset group level 1 is only found in category “a”.
mod1 <- brm( Y ~ X + (1|group),
family = categorical(link = "logit", refcat = "a"),
data = df)
> posterior_summary(mod1) %>% round(digits = 3)
Estimate Est.Error Q2.5 Q97.5
b_mub_Intercept 0.452 3.260 -5.760 7.223
b_muc_Intercept 0.191 2.978 -6.142 6.229
b_mub_X 2.464 5.463 -4.154 15.879
b_muc_X 1.386 4.111 -6.159 11.838
sd_group__mub_Intercept 10.172 10.299 2.018 39.487
sd_group__muc_Intercept 11.310 11.976 2.158 41.000
r_group__mub[1,Intercept] -10.265 14.011 -43.116 0.529
r_group__mub[2,Intercept] -8.547 12.212 -37.523 1.356
r_group__mub[3,Intercept] 11.371 14.573 0.220 44.897
r_group__mub[4,Intercept] 10.562 12.806 0.272 40.739
r_group__mub[5,Intercept] -6.096 13.921 -40.179 9.429
r_group__mub[6,Intercept] -5.223 14.422 -40.256 10.334
r_group__muc[1,Intercept] -11.016 14.123 -49.811 0.497
r_group__muc[2,Intercept] -9.569 13.562 -44.353 1.344
r_group__muc[3,Intercept] -6.402 13.830 -42.161 8.826
r_group__muc[4,Intercept] -6.379 14.707 -43.210 9.205
r_group__muc[5,Intercept] 12.545 14.153 0.392 49.174
r_group__muc[6,Intercept] 12.435 15.262 0.111 46.443
This model estimates a value for group level 1 in categories “b” and “c”, even though that combination does not exist in the data. This seems strange to me. It seems like the model cannot effectively estimate r_group_mub[1, Intercept]
because there are no group 1’s in category b! And in practice, the sd
of these estimates are really high.
I tried a variety of methods to have the model only estimate the group level effects for the specific categories in which they are found, such as nested and partially nested structures, but nothing worked (which was probably intentional).
Is this model still valid as it is? Is there a different way to write the formula, or a different way to remove certain categorical combinations from the model? Or should I take a different approach to this problem all together?
Thanks for the advice!