Specification of grouping factor in newdata to match re_formula in nested hierarchical model with BRMS

Kostasp · May 15, 2022, 8:03am

Hi all I am running a 3-level hierarchical linear model with brms to estimate between-runners and within-runners between races variability. The model is the following :

model1 <- 
 brm( 
  LNSPEED ~ 1 + linear_year + TIME + I(TIME^2) + I(TIME^3) + 
   TIME:linear_year + I(TIME^2):linear_year + I(TIME^3):linear_year + 
   (1+TIME|ID) + (1+TIME|ID:YEAR), 
  iter=2000, 
  control = list(adapt_delta = 0.95, max_treedepth=15), 
  seed=12345, 
  data=Dataset_1
)

where within-race LNSPEED is modeled as a cubic finction of within-race TIME , ID is a grouping factor of the individual runners and YEAR is a grouping factor of individual race (thus ID:YEAR is a grouping factor indicating within-runner between races).

I am estimating various predicted marginal effects:

The expected grand mean:

all_RUNNERS_1_dist <- 
 model1 %>% 
 epred_draws(
  newdata = expand_grid(linear_year=seq(0,1, by=0.25), TIME = seq(0, 8, by = 0.25)), 
  re_formula = NA
 )

The expected average performance of runners:

all_RUNNERS_1_avg_perform_dist <-
 model1 %>% 
 epred_draws(
  newdata = expand_grid(linear_year=seq(0,1, by=0.25), TIME = seq(0, 8, by = 0.25), ID = unique(Dataset_1$ID)),
  re_formula =~ (1+TIME | ID)
 )

However I cannot get an estimate for the expected performance of individual races within runners:

all_RUNNERS_1_race2race_perform_dist <- 
 model1 %>% 
 epred_draws(
  newdata = expand_grid(linear_year=seq(0,1, by=0.25), TIME = seq(0, 8, by = 0.25), **ID = unique(Dataset_1$ID), YEAR=unique(Dataset_1$YEAR)** ), 
  re_formula =~ (1+TIME | ID)+(1+TIME | ID:YEAR)
 )

More specifically ID = unique(Dataset_1$ID), YEAR=unique(Dataset_1$YEAR) produces an error since my data is unbalanced given that not all runners have competed in all races (the dataset includes seven races and runners have competed 1-6 races each).

I cannot get around on how the YEAR factor should be specified in the newdata argument to match the re_formula =~ (1+TIME | ID)+(1+TIME | ID:YEAR).

Thank you in advance.

edited by @jsocolar for syntax highlighting and code formatting.

edm · June 4, 2022, 3:59pm

Are you sure you want ID:YEAR here, instead of just YEAR? I believe that you would use ID:YEAR if, within a race, runner IDs stood for different runners. For example, runner 1 in race 1 is a different person from runner 1 in race 2. If the runner IDs always stand for the same person across races, then I think you would want (1 + TIME | ID) + (1 + TIME | YEAR) and then I think this solves your problem.

Kostasp · June 4, 2022, 5:44pm

The runner IDs stand for the same person across races (i.e. runner 1 in race 1 and runner 1 in race 2 is the same person). I do feel that (1 + TIME | YEAR) would just deliver the variation in intercepts and TIME slopes across YEAR. However I am interested in the variation of runners WITHIN races, thus ID:YEAR is in my opinion the correct model specification.

edm · June 4, 2022, 7:45pm

I think I missed the part about having multiple measures on each runner within a race, so I could see how ID:YEAR could be useful (though maybe also random effects wrt YEAR? not sure).

I wonder whether it helps to create a new IDYEAR column that you could use in place of the colon operator:

Dataset_1$IDYEAR <- with(Dataset_1, interaction(ID, YEAR, drop = TRUE))

Then use (1 + TIME | IDYEAR) in the model specification instead of ID:YEAR and expand_grid() with IDYEAR instead of with YEAR.

Kostasp · June 5, 2022, 2:24pm

Thank you very much for the suggestion, this is what I somehow had in mind but I could figure out how to specify a unique combination between ID and YEAR. So just to make sure that I understand correctly, is the argument below what you propose?

newdata = expand_grid(linear_year=seq(0,1, by=0.25), TIME = seq(0, 8, by = 0.25), ID = unique(Dataset_1$ID), IDYEAR=unique(Dataset_1$IDYEAR)),
re_formula =~ (1+TIME | ID)+(1+TIME | IDYEAR) )

edm · June 5, 2022, 6:58pm

Yes, that looks right. You might verify that the model with IDYEAR yields similar estimates to the one with ID:YEAR, but I think they are the same model.

Kostasp · June 5, 2022, 7:49pm

Yeap, I’ve already done that; both model specifications yield very similar estimates.
Thank you for even bothering respond to a 3weeks old post, really appreciated!!

Topic		Replies	Views
Model specification in brms Modeling brms	0	456	May 17, 2022
Understanding brms nested grouping term in multivariate response models brms hierarchical-model , multivariate-normal	4	997	December 5, 2024
Questions on the individualized formulas in HLM General hierarchical-model , brms	1	594	April 17, 2022
How to fit a nested hierarchical model? Modeling specification , hierarchical-model , brms	6	227	February 1, 2025
Individuals nested within groups brms	2	678	August 20, 2021

Specification of grouping factor in newdata to match re_formula in nested hierarchical model with BRMS

Related topics