Varying effects crossed vs. nested in brms

I’m currently working on a dataset that requires setting varying effects that account for a particular sort of group-level variation. Here is a toy example of my issue:

Assume that I have a dataset where three researchers (A, B, C) collect data on the levels of English proficiency (on a continuous scale from 0 to 10) and the grades the students get in English class (continuous, 1 to 4). The goal is to predict students’ grades by their proficiency. The three researchers survey students in four schools - sometimes the same school is surveyed twice or even three times. Afterwards, we notice that all researchers have measured students’ proficiency in individual schools differently than their colleagues, yet it is not clear if there is a systematic effect. In other words, we have strong random outliers in individual school-researcher combinations but since not all schools have been surveyed by every researcher equally, we cannot determine whether either one particular measurement is an outlier. The following code creates a toy dataset of the problem:

df = data.frame(
  'grade' = c(2, 4, 1, 2, 4, 5, 6, 1, 2, 2),
  'proficiency' = c(5, 9, 2, 10, 2, 3, 4, 10, 8, 9),
  'school' = c('I', 'II', 'III', 'IV', 'II', 'III', 'I', 'II', 'III', 'IV'),
  'researcher' = c('A', 'B', 'C', 'A', 'A', 'B', 'C', 'C', 'A', 'B')
)

In this dataset, reseacher A is an outlier in school II and III but not in school IV. So the effect is both tied to the researcher but not in general but regarding certain schools (A measured too low in school II, too high in school III, and okay in school IV).
My question therefore is how I specify varying effects in brms in such a way that I can account for this variation (if it is even possible to to that). The options I can think of are:

mod1 <- brm(grade ~ proficiency + (1|school) + (1|researcher), data = df)

and

mod2 <- brm(grade ~ proficiency + (1|researcher/school) , data = df)

Which one is be the one that comes closest to what I’d like to achieve? Or do I need to specify the model in some other way?

Thanks in advance!

  • Operating System: Windows 10
  • brms Version: 2.14.4

If I’m following you correctly, it seems like the ideal model–granted a balanced design–would be:

mod3 <- brm(grade ~ proficiency + (1|school) + (1|researcher) + (1|school:researcher), data = df

Since you’re not balanced and since you have so few levels within your groups, both of the models you proposed seem reasonable. One possibility would be to run a little simulation, analyzing the data both ways, and comparing fit.

Thanks very much for your reply! I tested this configuration on a toy dataset I built like the following step-by-step code shows:

#general set-up of 400 observations and 4 schools
df = data.frame(
  'proficiency' = runif(400, 0, 5),
  'school' = rep(c('I', 'II', 'III', 'IV'), 100)
)

df$grade = df$proficiency*0.5 #population-level slope

df$proficiency[df$school == 'I'] = df$proficiency[df$school == 'I'] + 2 #unique intercept for school I
df$proficiency[df$school == 'II'] = df$proficiency[df$school == 'II'] + 5 #unique intercept for school II
df$proficiency[df$school == 'III'] = df$proficiency[df$school == 'III'] + 1 #unique intercept for school III

df = df[sample(400,300),] #take out 100 observations so that not all schools are equally represented

df$researcher = rep(c('A', 'B', 'C'), 100) #add three researchers

df$proficiency[df$researcher == 'A'] = df$proficiency[df$researcher == 'A'] - 1 #unique intercept for researcher A
df$proficiency[df$researcher == 'B'] = df$proficiency[df$researcher == 'B'] + 1 #unique intercept for researcher B
df$proficiency[df$researcher == 'C'] = df$proficiency[df$researcher == 'C'] - 3 #unique intercept for researcher C

outliers = sample(100,50) #determine 50 observations to be outliers in researcher A's survey
df$proficiency[df$researcher == 'A'][outliers] = df$proficiency[df$researcher == 'A'][outliers]+rnorm(50, 10, 5) #make A systematically overcount those 50 observations

#build models with different varying intercept terms
mod_interaction <- brm(grade ~ proficiency + (1|school) + (1|researcher) + (1|school:researcher), data = df)
mod_base <- brm(grade ~ proficiency + (1|school) + (1|researcher), data = df)

The issue now is that there is little difference in the population-level effects between the models. I was hoping that the added term (1|school:researcher) would account for some of A’s idiosyncracies but it seems like A’s outliers are merely treated as noise. Moreover, both models severely underestimate the population-level slope (which should be 0.5) and both return an estimate between 0.03 and 0.07 (95% CI).
After testing the models on the simulated dataset, I’m not sure where the problem is. Did I misspecify the simulated dataset or do the varying effects obscure the real relationship more than they help? My goal is after all to specify a model that accounts for school and researcher effects as well as the fact that researcher A randomly overcounts certain schools. I’d be glad if you could help me see what went wrong here.

Consider the following:

mod_interaction2 <- 
  brm(grade ~ proficiency + (1 + proficiency |school) + (1 + proficiency|researcher) + (1 + proficiency|school:researcher), 
      data = df,
      cores = 4,
      control = list(adapt_delta = .99))

Thanks! That seems to improve the estimate, yet this time, the slope is between -0.24 and 0.89 which is perhaps due to the fact that this model set-up needing more data to accurately estimate the correct values. At least there is no extreme underestimating of the population-level slope.

1 Like