Posterior_epred is inconsistent in how it handles re_formula when random grouping factors are set to NA in newdata

isabellaghement · November 24, 2021, 3:47am

Hi Paul,

There seems to be an inconsistency in how brms::posterior_epred() works for a mixed effects model with nested random grouping factors depending on how re_formula is set up when random grouping factors present in the model are set to NA in newdata.

I found some data on the Internet which I will use to illustrate the problem: classes nested in schools. In this context, I am primarily interested in inference on the effect of a school-level variable.

As per https://lme4.r-forge.r-project.org/slides/2009-07-01-Lausanne/3SimpleD.pdf:

“The classroom data are a cross-section of students within classes within schools. The mathgain variable is the difference in mathematics achievement scores in grade 1 and kindergarten.”

### import classroom data in R 
classroom <- read.csv("http://www-personal.umich.edu/~bwest/classroom.csv")
classroom <- dplyr::arrange(classroom, classid, schoolid)

### examine structure of classroom dataset 
str(classroom)

### convert classid and schoolid to factors (as they will be treated as random grouping factors) 
classroom$classid <- factor(classroom$classid)
classroom$schoolid <- factor(classroom$schoolid)

### need a binary school-level variable; focus will be on conducting inference on the effect of this variable
range(classroom$housepov)
classroom$housepov <- ifelse(classroom$housepov > 0.250, 1, 0)
table(classroom$housepov)

### fit brms model 
model <- brm(mathgain ~ 1 + housepov + (1|schoolid/classid),
            data = classroom)

### compare expected values of mathgain response 
### for "typical" school where housepov = 1 versus 
###     "typical" school where housepov = 0
### ignoring the within-school variability between classes 

epred_typical_school <-  brms::posterior_epred( 
  model, 
  newdata = data.frame(housepov = c(0, 1), 
                       schoolid = NA, 
                       classid = NA), 
  re_formula = ~ (1|schoolid))   

epred_typical_school 


### compare expected values of mathgain response  
### for "typical" class within typical school with housepov = 1 versus 
###     "typical" class within typical school with housepov = 0 
### accounting for the within-school variability between classes 

epred_typical_class_typical_school <-  brms::posterior_epred( 
  model, 
  newdata = data.frame(housepov = c(0, 1), 
                       schoolid = NA, 
                       classid = NA),
  re_formula = ~ (1|schoolid/classid))

The inconsistency that I get comes from the re_formula being specified as re_formula = ~ (1|schoolid/classid) when both schoolid and classid are specified as NA in newdata. This throws R off and yields the following error message:

Error: Levels ‘NA_NA’ of grouping factor ‘schoolid:classid’ cannot be found in the fitted model. Consider setting argument ‘allow_new_levels’ to TRUE.

It seems to me that, when schoolid and classid are both set to NA in newdata:

R should NOT complain when encountering re_formula = ~ (1|schoolid/classid) if it does NOT complain when encountering re_formula = ~ (1|schoolid)

OR

R should complain when encountering re_formula = ~ (1|schoolid/classid) but should also complain when encountering re_formula = ~ (1|schoolid)

Can you please look into this inconsistency for me and see if it’s something that can be (hopefully easily) fixed to avoid confusing the users of brms?

This inconsistency is also an issue for tidybayes::add_epred_draws() so if it is fixed for brms it will work for tidybayes too.

Thanks very much!

Isabella

Operating System:

R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

brms Version: brms_2.16.2

paul.buerkner · November 28, 2021, 10:02pm

I confirm this is a bug and I will work on a fix.

Paul

isabellaghement · November 28, 2021, 11:32pm

Thanks very much, Paul! I very much appreciate you taking the time to look into this issue.

paul.buerkner · November 29, 2021, 1:41pm

On github all of these inputs now consistently throw an error.

Topic		Replies	Views
Posterior_epred with zero inflated count model Modeling brms	2	175	April 4, 2024
Errors with BLUPs with complex random effects in brms using posterior_epred brms techniques , specification , brms	2	563	June 25, 2021
Random slopes with posterior_linpred brms	4	545	March 12, 2019
Help me understand the ouput of the posterior_epred function :-) brms	2	500	July 27, 2020
Find the predictor value at which posterior_epred(model) = x (in a multilevel logistic regression) brms specification , logistic-regression , conditional-effects , brms	8	174	February 25, 2025

Posterior_epred is inconsistent in how it handles re_formula when random grouping factors are set to NA in newdata

Related topics