Hi Paul,
There seems to be an inconsistency in how brms::posterior_epred() works for a mixed effects model with nested random grouping factors depending on how re_formula is set up when random grouping factors present in the model are set to NA in newdata.
I found some data on the Internet which I will use to illustrate the problem: classes nested in schools. In this context, I am primarily interested in inference on the effect of a school-level variable.
As per https://lme4.r-forge.r-project.org/slides/2009-07-01-Lausanne/3SimpleD.pdf:
“The classroom data are a cross-section of students within classes within schools. The mathgain variable is the difference in mathematics achievement scores in grade 1 and kindergarten.”
### import classroom data in R
classroom <- read.csv("http://www-personal.umich.edu/~bwest/classroom.csv")
classroom <- dplyr::arrange(classroom, classid, schoolid)
### examine structure of classroom dataset
str(classroom)
### convert classid and schoolid to factors (as they will be treated as random grouping factors)
classroom$classid <- factor(classroom$classid)
classroom$schoolid <- factor(classroom$schoolid)
### need a binary school-level variable; focus will be on conducting inference on the effect of this variable
range(classroom$housepov)
classroom$housepov <- ifelse(classroom$housepov > 0.250, 1, 0)
table(classroom$housepov)
### fit brms model
model <- brm(mathgain ~ 1 + housepov + (1|schoolid/classid),
data = classroom)
### compare expected values of mathgain response
### for "typical" school where housepov = 1 versus
### "typical" school where housepov = 0
### ignoring the within-school variability between classes
epred_typical_school <- brms::posterior_epred(
model,
newdata = data.frame(housepov = c(0, 1),
schoolid = NA,
classid = NA),
re_formula = ~ (1|schoolid))
epred_typical_school
### compare expected values of mathgain response
### for "typical" class within typical school with housepov = 1 versus
### "typical" class within typical school with housepov = 0
### accounting for the within-school variability between classes
epred_typical_class_typical_school <- brms::posterior_epred(
model,
newdata = data.frame(housepov = c(0, 1),
schoolid = NA,
classid = NA),
re_formula = ~ (1|schoolid/classid))
The inconsistency that I get comes from the re_formula being specified as re_formula = ~ (1|schoolid/classid) when both schoolid and classid are specified as NA in newdata. This throws R off and yields the following error message:
Error: Levels ‘NA_NA’ of grouping factor ‘schoolid:classid’ cannot be found in the fitted model. Consider setting argument ‘allow_new_levels’ to TRUE.
It seems to me that, when schoolid and classid are both set to NA in newdata:
R should NOT complain when encountering re_formula = ~ (1|schoolid/classid) if it does NOT complain when encountering re_formula = ~ (1|schoolid)
OR
R should complain when encountering re_formula = ~ (1|schoolid/classid) but should also complain when encountering re_formula = ~ (1|schoolid)
Can you please look into this inconsistency for me and see if it’s something that can be (hopefully easily) fixed to avoid confusing the users of brms?
This inconsistency is also an issue for tidybayes::add_epred_draws() so if it is fixed for brms it will work for tidybayes too.
Thanks very much!
Isabella
- Operating System:
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
- brms Version: brms_2.16.2