Specification of grouping factor in newdata to match re_formula in nested hierarchical model with BRMS

Hi all I am running a 3-level hierarchical linear model with brms to estimate between-runners and within-runners between races variability. The model is the following :

model1 <- 
  LNSPEED ~ 1 + linear_year + TIME + I(TIME^2) + I(TIME^3) + 
   TIME:linear_year + I(TIME^2):linear_year + I(TIME^3):linear_year + 
   (1+TIME|ID) + (1+TIME|ID:YEAR), 
  control = list(adapt_delta = 0.95, max_treedepth=15), 

where within-race LNSPEED is modeled as a cubic finction of within-race TIME , ID is a grouping factor of the individual runners and YEAR is a grouping factor of individual race (thus ID:YEAR is a grouping factor indicating within-runner between races).

I am estimating various predicted marginal effects:

The expected grand mean:

all_RUNNERS_1_dist <- 
 model1 %>% 
  newdata = expand_grid(linear_year=seq(0,1, by=0.25), TIME = seq(0, 8, by = 0.25)), 
  re_formula = NA

The expected average performance of runners:

all_RUNNERS_1_avg_perform_dist <-
 model1 %>% 
  newdata = expand_grid(linear_year=seq(0,1, by=0.25), TIME = seq(0, 8, by = 0.25), ID = unique(Dataset_1$ID)),
  re_formula =~ (1+TIME | ID)

However I cannot get an estimate for the expected performance of individual races within runners:

all_RUNNERS_1_race2race_perform_dist <- 
 model1 %>% 
  newdata = expand_grid(linear_year=seq(0,1, by=0.25), TIME = seq(0, 8, by = 0.25), **ID = unique(Dataset_1$ID), YEAR=unique(Dataset_1$YEAR)** ), 
  re_formula =~ (1+TIME | ID)+(1+TIME | ID:YEAR)

More specifically ID = unique(Dataset_1$ID), YEAR=unique(Dataset_1$YEAR) produces an error since my data is unbalanced given that not all runners have competed in all races (the dataset includes seven races and runners have competed 1-6 races each).

I cannot get around on how the YEAR factor should be specified in the newdata argument to match the re_formula =~ (1+TIME | ID)+(1+TIME | ID:YEAR).

Thank you in advance.

edited by @jsocolar for syntax highlighting and code formatting.

Are you sure you want ID:YEAR here, instead of just YEAR? I believe that you would use ID:YEAR if, within a race, runner IDs stood for different runners. For example, runner 1 in race 1 is a different person from runner 1 in race 2. If the runner IDs always stand for the same person across races, then I think you would want (1 + TIME | ID) + (1 + TIME | YEAR) and then I think this solves your problem.

The runner IDs stand for the same person across races (i.e. runner 1 in race 1 and runner 1 in race 2 is the same person). I do feel that (1 + TIME | YEAR) would just deliver the variation in intercepts and TIME slopes across YEAR. However I am interested in the variation of runners WITHIN races, thus ID:YEAR is in my opinion the correct model specification.

I think I missed the part about having multiple measures on each runner within a race, so I could see how ID:YEAR could be useful (though maybe also random effects wrt YEAR? not sure).

I wonder whether it helps to create a new IDYEAR column that you could use in place of the colon operator:

Dataset_1$IDYEAR <- with(Dataset_1, interaction(ID, YEAR, drop = TRUE))

Then use (1 + TIME | IDYEAR) in the model specification instead of ID:YEAR and expand_grid() with IDYEAR instead of with YEAR.

Thank you very much for the suggestion, this is what I somehow had in mind but I could figure out how to specify a unique combination between ID and YEAR. So just to make sure that I understand correctly, is the argument below what you propose?

newdata = expand_grid(linear_year=seq(0,1, by=0.25), TIME = seq(0, 8, by = 0.25), ID = unique(Dataset_1$ID), IDYEAR=unique(Dataset_1$IDYEAR)),
re_formula =~ (1+TIME | ID)+(1+TIME | IDYEAR) )

Yes, that looks right. You might verify that the model with IDYEAR yields similar estimates to the one with ID:YEAR, but I think they are the same model.

Yeap, I’ve already done that; both model specifications yield very similar estimates.
Thank you for even bothering respond to a 3weeks old post, really appreciated!!