I would like to fit a model wherein replicates belong to different groups (a multiple membership model), but importantly the replicates can differ in the number of groups to which they belong. What is the appropriate syntax to account for multiple membership when replicates belong to different numbers of groups? I have tried several alternatives, but the differing answers make me uncertain about which is correct.
From a previously answered question, it’s clear that filling in the extra group columns with
NAs will cause the model to drop these replicates entirely. We can reproduce that undesired behaviour with an example dataset:
set.seed(234) dat1 <- data.frame( y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100), g1 = sample(letters[1:10], 100, TRUE), g2 = sample(letters[1:10], 100, TRUE), g3 = sample(c(letters[1:10], NA_character_), 100, TRUE), stringsAsFactors = FALSE ) fit1 <- brm(y ~ x1 + (1|mm(g1, g2, g3)), data = dat1) sum(is.na(dat1$g3)) # 8 rows have missing g3 nrow(fit1$data) # model throws away rows with NA
The answer proposed using filling of the
NAs and weighting schemes to fit the model appropriately and with the complete dataset. I can think of three alternative strategies to do this:
(1) Fill extra groups using only the group levels to which the replicate is a member, balancing to achieve the appropriate weighting
tl;dr Clear that this works easily in some scenarios, but grows to be very tedious when replicates can belong to a wide range of group numbers.
This works only in limited cases. For example, when there are four group columns (
g4) and a replicate only belongs to two groups, one could code it as
g1 = "A",
g2 = "B",
g3 = "A",
g4 = "B". Each group would be given equal weights. This approach fails for the example dataset generated above, unless one were to expand the number of grouping columns to a values whose factors include all of the possible numbers of groups a replicate could belong to. Here, that would require at least 6 group columns to address the cases that belong to 2 vs. 3 groups (6 is the lowest number with factors 2 & 3).
(2) Introduce a new “dummy” group, with a corresponding weight of zero
tl;dr This behaves unexpectedly by estimating an intercept for the “dummy” group despite always having a weight of zero.
dat2 <- dat1 dat2$w1 <- ifelse(is.na(dat2$g3), 0.5, 1/3) dat2$w2 <- ifelse(is.na(dat2$g3), 0.5, 1/3) dat2$w3 <- ifelse(is.na(dat2$g3), 0, 1/3) dat2$g3[is.na(dat2$g3)] <- "dummy" # recode NAs to "dummy" in g3 fit2 <- update(fit1, . ~ x1 + (1|mm(g1, g2, g3, weights = cbind(w1, w2, w3))), newdata = dat2) nrow(fit2$data) == 100 # keeps all data as expected ranef(fit2) # fits intercept for "dummy" (unexpected & undesired behaviour)
The model should not estimate an intercept for
dummy because none of the replicates belong to that group…but it does. I suspect the estimate will always be nearly zero, but clearly is undesired behaviour.
(3) Haphazardly fill missing group with a pre-existing level, with a corresponding weight of zero
tl;dr The choice of group(s) used to replace
NAs may be affecting the model fit, despite always having a weight of zero.
We ought to be able to replace the
dummy group introduced above with any pre-existing group value provided that the corresponding weight remains
0. For convenience, let’s just use the value from
dat3a <- dat2 dat3a$g3[dat3a$g3 == "dummy"] <- dat3a$g1[dat3a$g3 == "dummy"] fit3a <- update(fit2, newdata = dat3a, seed = 984) ranef(fit3a) # no intercept for "dummy" (expected & desired behaviour)
It works! Provided that our choice of
dummy replacement doesn’t matter. But it appears that the choice can change the fit. We can see this by replacing
dummy with the value from
g2 and comparing the fits:
dat3b <- dat2 dat3b$g3[dat3b$g3 == "dummy"] <- dat3b$g2[dat3b$g3 == "dummy"] fit3b <- update(fit2, newdata = dat3b, seed = 984) identical(ranef(fit3a), ranef(fit3b)) # FALSE
It’s possible that the
seed argument isn’t behaving as I expect, and that the fits aren’t truly different. I don’t see the clear way to test this.
I would very much appreciate help/guidance as to how to appropriately handle this multiple membership problem.
- Operating System: OSX 10.14.6
- brms Version: 2.10.0