Allow_new_levels in multi-membership

Dear colleagues,
when I run the posterior_predict with newdata with new group category, even setting allow_new_levels=TRUE I get an error.

I have a 5 different categories, but some new data are not associated to any old leve. I labelled them “NoType”.
The grouping variables are simply labelled “Path1” to “Path5”
When I run the code:

posterior_predict(BMMFT, newdata= zdata_t)
Fehler: Levels 'NoType' of grouping factor 'mmPath1Path2Path3Path4Path5' cannot be found in the fitted model. Consider setting argument 'allow_new_levels' to TRUE.

Then I set allow_new_levels=T and get:

posterior_predict(BMMFT, newdata= zdata_t, allow_new_levels=T)
Fehler in if (has_new_levels) { : Fehlender Wert, wo TRUE/FALSE nötig ist

(sorry, my computer is German and I could not change the RStudio messages language, sometimes it comes in English, sometimes in German, but I guess you can understand that)

It seems that it should be very much straightforward but I do not understand the reason for this error.
Is there a setting in brm() when fitting the model that is necessary to later use new levels?

Thank you in advance for your attention
Guilherme

  • Operating System:
    Windows 10 , Rstudio 1.3.9, R 4.0.1
  • brms Version: 2.13.0

Looks like a bug to me. Can you try with the latest brms version? If that does not solve the problem, please provide a minimial reproducible example so that I can investigate the problem.

Hello,
thank you for the prompt reply!
I updated the brms but still got the same error.

Sorry, I didn’t know how to make it reproducible in a shorter way, please, could you try the following?

I generate below some random data in the same type and unit range as the real data.


library(brms)

#### I have a few predictors, metric and ordered. The response is "rloss".
#### Each datapoint might be associated up to 5 categories (multi-membership).
    Memberships = c("1Levee","2River","3Urban","4Flash","5Ground")


    df <- data.frame(rloss = rbeta(500,2,9),
                     WaterDepth = rnorm(500,0,2),
                     BuildingArea = rnorm(500,0,2),
                     Duration = rnorm(500,0,2),
                     Contamination= ordered(sample(0:2,500, T)),
                     PLPM = ordered(sample(0:2,500, T)),
                     FloodExperience = ordered(sample(0:4,500, T)),
                     Insured = factor(sample(0:1,500, T)),

#### These are the 5 columns of multi-membership. I have tried both, as factors or as simple string variables
                     Path1 = factor(sample(Memberships,500, T)),
                     Path2 = factor(sample(Memberships,500, T)),
                     Path3 = factor(sample(Memberships,500, T)),
                     Path4 = factor(sample(Memberships,500, T)),
                     Path5 = factor(sample(Memberships,500, T)),
#### And the 5 respective weights.
                     w1 = sample(c(0.2,0.4,1.0), 500,T),
                     w2 = sample(c(0.2,0.4,1.0), 500,T),
                     w3 = sample(c(0.2,0.4,1.0), 500,T),
                     w4 = sample(c(0.2,0.4,1.0), 500,T),
                     w5 = sample(c(0.2,0.4,1.0), 500,T)
    )

#### Here is the formula. Only the metric variables are under the multi-membership for varying-slopes.
    b_formula = brms::bf(rloss ~ 1 + WaterDepth + BuildingArea + Duration +
                           mo(Contamination) + mo(PLPM) + mo(FloodExperience) + Insured +  (WaterDepth + BuildingArea + Duration + Insured ||
                              mm(Path1, Path2, Path3, Path4, Path5,
                                 weights = cbind(w1, w2, w3, w4, w5), scale = T)))

#### I made this to speed up the fitting with random data, it converged faster
df$rloss <- boot::inv.logit(df$WaterDepth*rnorm(500,0.5,0.01) + df$Duration*rnorm(500,-0.2,0.05) + rnorm(500,0,0.1))

#### I don't exepect this to be the problem, but anyway, I defined the priors like this:
b_prior <- c(
  set_prior("normal(+0.0, 1.0)", class = "Intercept"),
  set_prior("normal(+0.5, 0.35)", class = "b", coef="WaterDepth"),
  set_prior("normal(-0.5, 0.35)", class = "b", coef="BuildingArea"),
  set_prior("normal(+0.5, 0.35)", class = "b", coef="Duration"),
  set_prior("normal(+0.5, 0.35)", class = "b", coef="moContamination"),
  set_prior("normal(-0.5, 0.35)", class = "b", coef="moPLPM"),
  set_prior("normal(-0.5, 0.35)", class = "b", coef="moFloodExperience"),
  set_prior("normal(+0.0, 0.50)", class = "b", coef="Insured1"),
  set_prior("gamma(2,5)", class = "sd"),
  set_prior("gamma(0.1,0.1)", class = "phi")
)

# Train the model
fit1 <-brms::brm(formula = b_formula,
                  data= df,
                  family = 'Beta',
                  prior = b_prior
)

# Using the same data is fine, the function works
pre1 <- brms::posterior_predict(fit1, newdata=df)

# Now I change some cases to a new category. I tried adding the levels also to the other columns, or keeping the 5 columns as simple characteres, the result is the same
df2 <- df
df2$Path1 <- as.character(df2$Path1)
df2$Path1[1:5] <- "NoType"
df2$Path1 <- factor(df2$Path1)
df2$Path2 <- factor(df2$Path2, levels=c("1Levee", "2River", "3Urban", "4Flash", "5Ground","NoType"))
df2$Path3 <- factor(df2$Path3, levels=c("1Levee", "2River", "3Urban", "4Flash", "5Ground","NoType"))
df2$Path4 <- factor(df2$Path4, levels=c("1Levee", "2River", "3Urban", "4Flash", "5Ground","NoType"))
df2$Path5 <- factor(df2$Path5, levels=c("1Levee", "2River", "3Urban", "4Flash", "5Ground","NoType"))

# Try with the new category
pre2 <- brms::posterior_predict(fit1, newdata=df2, allow_new_levels=T)

And the error is

Fehler in if (has_new_levels) { : Fehlender Wert, wo TRUE/FALSE nötig ist


Windows 10, x86_64, mingw32
R version 4.0.1
RStudio 1.3.959
brms 2.14.4

Thank you very much for your support!
Guilherme

Thanks. This was a bug in brms that should now be fixed on github.

1 Like

Thank you, Paul, it worked now!