I would like to fit a model wherein replicates belong to different groups (a multiple membership model), but importantly the replicates can differ in the *number* of groups to which they belong. **What is the appropriate syntax to account for multiple membership when replicates belong to different numbers of groups?** I have tried several alternatives, but the differing answers make me uncertain about which is correct.

From a previously answered question, it’s clear that filling in the extra group columns with `NA`

s will cause the model to drop these replicates entirely. We can reproduce that undesired behaviour with an example dataset:

```
set.seed(234)
dat1 <- data.frame(
y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100),
g1 = sample(letters[1:10], 100, TRUE),
g2 = sample(letters[1:10], 100, TRUE),
g3 = sample(c(letters[1:10], NA_character_), 100, TRUE),
stringsAsFactors = FALSE
)
fit1 <- brm(y ~ x1 + (1|mm(g1, g2, g3)), data = dat1)
sum(is.na(dat1$g3)) # 8 rows have missing g3
nrow(fit1$data) # model throws away rows with NA
```

The answer proposed using filling of the `NA`

s and weighting schemes to fit the model appropriately and with the complete dataset. I can think of three alternative strategies to do this:

### (1) Fill extra groups using only the group levels to which the replicate is a member, balancing to achieve the appropriate weighting

*tl;dr Clear that this works easily in some scenarios, but grows to be very tedious when replicates can belong to a wide range of group numbers.*

This works only in limited cases. For example, when there are four group columns (`g1`

, `g2`

, `g3`

, `g4`

) and a replicate only belongs to two groups, one could code it as `g1 = "A"`

, `g2 = "B"`

, `g3 = "A"`

, `g4 = "B"`

. Each group would be given equal weights. This approach fails for the example dataset generated above, unless one were to expand the number of grouping columns to a values whose factors include **all** of the possible numbers of groups a replicate could belong to. Here, that would require at least 6 group columns to address the cases that belong to 2 vs. 3 groups (6 is the lowest number with factors 2 & 3).

### (2) Introduce a new “dummy” group, with a corresponding weight of zero

*tl;dr This behaves unexpectedly by estimating an intercept for the “dummy” group despite always having a weight of zero.*

```
dat2 <- dat1
dat2$w1 <- ifelse(is.na(dat2$g3), 0.5, 1/3)
dat2$w2 <- ifelse(is.na(dat2$g3), 0.5, 1/3)
dat2$w3 <- ifelse(is.na(dat2$g3), 0, 1/3)
dat2$g3[is.na(dat2$g3)] <- "dummy" # recode NAs to "dummy" in g3
fit2 <- update(fit1, . ~ x1 + (1|mm(g1, g2, g3,
weights = cbind(w1, w2, w3))),
newdata = dat2)
nrow(fit2$data) == 100 # keeps all data as expected
ranef(fit2) # fits intercept for "dummy" (unexpected & undesired behaviour)
```

The model should not estimate an intercept for `dummy`

because none of the replicates belong to that group…but it does. I suspect the estimate will always be nearly zero, but clearly is undesired behaviour.

### (3) Haphazardly fill missing group with a pre-existing level, with a corresponding weight of zero

*tl;dr The choice of group(s) used to replace NAs may be affecting the model fit, despite always having a weight of zero.*

We ought to be able to replace the `dummy`

group introduced above with any pre-existing group value provided that the corresponding weight remains `0`

. For convenience, let’s just use the value from `g1`

:

```
dat3a <- dat2
dat3a$g3[dat3a$g3 == "dummy"] <- dat3a$g1[dat3a$g3 == "dummy"]
fit3a <- update(fit2, newdata = dat3a, seed = 984)
ranef(fit3a) # no intercept for "dummy" (expected & desired behaviour)
```

It works! Provided that our choice of `dummy`

replacement doesn’t matter. But it appears that the choice can change the fit. We can see this by replacing `dummy`

with the value from `g2`

and comparing the fits:

```
dat3b <- dat2
dat3b$g3[dat3b$g3 == "dummy"] <- dat3b$g2[dat3b$g3 == "dummy"]
fit3b <- update(fit2, newdata = dat3b, seed = 984)
identical(ranef(fit3a), ranef(fit3b)) # FALSE
```

It’s possible that the `seed`

argument isn’t behaving as I expect, and that the fits aren’t truly different. I don’t see the clear way to test this.

I would very much appreciate help/guidance as to how to appropriately handle this multiple membership problem.

- Operating System: OSX 10.14.6
- brms Version: 2.10.0