Understanding brms syntax using |<ID>| and how that affects correlation

Hi all,

I’m looking to get a deeper understanding of the hierarchical linear model I’ve built in brms to see if there’s any way I can improve it / to see if it’s doing what I think it’s doing. To describe the model and data, I have a variable, CHG, which I assume changes as a function of plasma concentration, PKCONC, and visit, AVISIT, for a series of subjects, who all have unique SUBJIDs.

For anyone trying to get a better sense of the data, PKCONC and CHG are both continuous variables, while AVISIT is categorical and can take on one of two values, “Visit 1” or “Visit 2”.

I started with a simple model bf(CHG ~ 1 + PKCONC + (1 + PKCONC | SUBJID) (full code below) which to me means that I’m estimating a population level intercept and slope (as a function of PKCONC) and that I’m also estimating individual level intercepts and slopes (as a function of PKCONC), for each individual (each unique SUBJID). I then tried a bunch of other variations on that theme, and selected the best model using LOO-CV and WAIC.

# Define the model formula
base_formula <- bf(CHG ~ 1 + PKCONC + (1 + PKCONC | SUBJID)) 

# Fit the model using the brm function
base_fit <- brm(base_formula, 
                data = qtc2, 
                family = "gaussian", 
                cores = 4, 
                chains = 4, 
                file = "base", 
                save_all_pars = TRUE)

The final model was bf(CHG ~ 1 + PKCONC * AVISIT + (1 + PKCONC * AVISIT | SUBJID) with a standard deviation estimated for each visit level (below) but I’m wondering if I should use a different syntax to specify the IDs following here. Would it make sense to instead use CHG ~ 1 + PKCONC * AVISIT + (1 + PKCONC * AVISIT | gr(SUBJID, id = "i")), and how does this structurally change the model (I’m assuming that it now treats data for each SUBJID as correlated)?

final_fit <- brm(bf(CHG ~ 1 + PKCONC * AVISIT + (1 + PKCONC * AVISIT | SUBJID), 
                     sigma ~ AVISIT),
           data = qtc2,
           family = gaussian(),
           # prior = NULL, 
           prior = c(set_prior("normal(0, 1)", class = "b"), # Fixed effects, usually small, around 0 
                     set_prior("exponential(1)", class = "sd"), # Random effects standard deviations, usually not very uncertain 
                     set_prior("lkj(1)", class = "cor")), # Random effects correlations, not a ton of correlation to be expected 
           warmup = 4000, 
           iter = 8000,
           chains = 4,
           control = list(adapt_delta = 0.99),
           cores = 4) #, 
           # file = "final_fit")

Would it be syntactically better to instead group the individuals by ~ x + (1 | gr(subject, by = x)) or am I totally off the mark on that one?

Thank you very much and I really appreciate the package.

  • Operating System: Windows 10
  • brms Version: 2.14.4